Arthur Flexer,
"On the validity of employing ChatGPT for distant reading of music similarity"
: Proceedings of the 25th Int. Society for Music Information Retrieval Conference, San Francisco, United States, 2024
Original Titel:
On the validity of employing ChatGPT for distant reading of music similarity
Sprache des Titels:
Englisch
Original Buchtitel:
Proceedings of the 25th Int. Society for Music Information Retrieval Conference, San Francisco, United States
Original Kurzfassung:
In this work we explore whether large language models (LLM) can be a useful and valid tool for music knowledge discovery. LLMs offer an interface to enormous quantities of text and hence can be seen as a new tool for 'distant reading', i.e. the computational analysis of text including sources about music. More specifically we investigated whether ratings of music similarity, as measured via human listening tests, can be recovered from textual data by using ChatGPT. We examined the inferences that can be drawn from these experiments through the formal lens of validity. We showed that correlation of ChatGPT with human raters is of moderate positive size but also lower than the average human inter-rater agreement. By evaluating a number of threats to validity and conducting additional experiments with ChatGPT, we were able to show that especially construct validity of such an approach is seriously compromised. The opaque black box nature of ChatGPT makes it close to impossible to judge the experiment's construct validity, i.e. the relationship between what is meant to be inferred from the experiment, which are estimates of music similarity, and what is actually being measured. As a consequence the use of LLMs for music knowledge discovery cannot be recommended.