In the minds of stochastic parrots: Benchmarking, evaluating and interpreting large language models

dc.contributor.authorMorger, Felix
dc.date.accessioned2024-11-18T10:28:08Z
dc.date.available2024-11-18T10:28:08Z
dc.date.issued2024-11-18
dc.description.abstractThe arrival of large language models (LLMs) in recent years has changed the landscape of natural language processing (NLP). Their impressive performance on popular benchmarks, ability to solve a range of different tasks and their human-like linguistic interactional abilities, have prompted a debate into whether these are just "stochastic parrots" who are cleverly repeating what humans say without understanding its meaning or whether they are acquiring essential language capabilities, which would be an important stepping stone towards artificial general intelligence. To tackle this question, developing analysis methods to measure and understand the language capabilities of LLMs has become a defining challenge. These include developing benchmarks to reliably measure their performance as well and interpretability methods to gauge their inner-workings. This is especially relevant at a time when these models already are having a considerable impact on our society. An increasing amount users are affected by the technology and calls are made for transparent, regulated and thorough evaluation of AI. In these efforts, it is important to estimate the possibilities and limitations of these analysis methods since they will play an important role in holding technologies in AI accountable. In this compilation thesis, I expound on the components and processes involved in analyzing LLMs. The articles included in this compilation thesis use different approaches for analyzing LLMs, from introducing a multi-task benchmark Superlim for Swedish NLU to investigating LLMs' ability to predict language variation. To this effort I explore what the possibilities and limitations are of popular analysis methods and what implications these have for developing LLMs. I argue that integrating explanatory approaches from empirical linguistic research is important to understand the role of both the data and the linguistic features used when analyzing LLMs. Doing so does not only help guide the development of LLMs, but also bring insights into linguistics.sv
dc.gup.defencedate2024-12-13
dc.gup.defenceplaceFredagen den 13 december, kl. 13:15, J330, Humanisten, Renströmsgatan 6sv
dc.gup.departmentDepartment of Swedish, Multilingualism, Language Technology ; Institutionen för svenska, flerspråkighet och språkteknologisv
dc.gup.dissdb-fakultetHF
dc.gup.mailfelix.morger@gu.sesv
dc.gup.originGöteborgs universitet. Humanistiska fakultetenswe
dc.gup.originUniversity of Gothenburg. Faculty of Humanitieseng
dc.identifier.isbn978-91-8069-944-0 (PDF)
dc.identifier.isbn978-91-8069-943-3 (Print)
dc.identifier.urihttps://hdl.handle.net/2077/83731
dc.language.isoengsv
dc.relation.haspartMorger, Felix, Stephanie Brandl, Lisa Beinborn & Nora Hollenstein. 2022. A cross-lingual comparison of human and model relative word importance. In Simon Dobnik, Julian Grove & Asad Sayeed (eds.), Proceedings of the 2022 CLASP Conference on (Dis)embodiment, 11–23. Gothenburg: Association for Computational Linguistics. https://aclanthology.org/2022.clasp-1.2sv
dc.relation.haspartMorger, Felix. 2024. SweDiagnostics: A diagnostics natural language inference dataset for Swedish. In Pierre Zweigenbaum, Reinhard Rapp & Serge Sharoff (eds.), Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) @ LREC-COLING 2024, 118–124. Torino: ELRA & ICCL.https://aclanthology.org/2024.bucc-1.13/sv
dc.relation.haspartMorger, Felix. 2023. Are there any limits to English-Swedish language transfer? A fine-grained analysis using natural language inference. In Proceedings of the Second Workshop on Resources and Representations for Under-resourced Languages and Domains (RESOURCEFUL-2023), 30–41. Tórshavn: Association for Computational Linguistics. https://aclanthology.org/2023.resourceful-1.5/sv
dc.relation.haspartBerdicevskis, Aleksandrs, Gerlof Bouma, Robin Kurtz, Felix Morger, Joey Öhman, Yvonne Adesam, Lars Borin, Dana Dannélls, Markus Forsberg, Tim Isbister, Anna Lindahl, Martin Malmsten, Faton Rekathati, Magnus Sahlgren, Elena Volodina, Love Börjeson, Simon Hengchen & Nina Tahmasebi. 2023. Superlim: A Swedish language understanding evaluation benchmark. In Houda Bouamor, Juan Pino & Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 8137–8153. Singapore: Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.506sv
dc.relation.haspartFelix Morger. 2024. When Sparv met Superlim…A Sparv plugin for natural language understanding analysis of Swedish. Tech. rep. University of Gothenburg. https://gupea.ub.gu.se/handle/2077/83664sv
dc.relation.haspartFelix Morger, Aleksandrs Berdicevskis. 2024. Gauging linguistic variation using LLMs. Unpublished manuscript.sv
dc.relation.ispartofseries32sv
dc.subjectnatural language processing, machine learning, machine learning interpretability, large language models, benchmarkingsv
dc.titleIn the minds of stochastic parrots: Benchmarking, evaluating and interpreting large language modelssv
dc.typeText
dc.type.degreeDoctor of Philosophysv
dc.type.svepDoctoral thesiseng

Files

Original bundle

Now showing 1 - 3 of 3
No Thumbnail Available
Name:
omslag-epublikation.pdf
Size:
1.28 MB
Format:
Adobe Portable Document Format
Description:
Cover
No Thumbnail Available
Name:
doctoral-thesis-felixm.pdf
Size:
4.46 MB
Format:
Adobe Portable Document Format
Description:
Thesis
No Thumbnail Available
Name:
Spikblad.pdf
Size:
80.77 KB
Format:
Adobe Portable Document Format
Description:
spikblad

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.68 KB
Format:
Item-specific license agreed upon to submission
Description: