Masteruppsatser / Master in Language Technology

Permanent URI for this collectionhttps://gupea-staging.ub.gu.se/handle/2077/61848

Browse

Now showing 1 - 20 of 44

Adaptive Game-Based Swedish Language Learning A Hybrid AI Approach to Content Generation
(2025-09-25) Geng, Tianyi; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Abstract This thesis tests the performance of LLMs on pre-generated L2 Swedish learning content across beginner to intermediate CEFR levels (A1–B2) by employing them into a self-developed language learning game (including language learning and self-validated language assessments). This study addresses a significant research gap in Swedish L2 speaking assessment and learning by providing two systematic evaluation components (with respectively designed frameworks): first, of LLM capabilities of CEFR-aligned Swedish content generation, and second, of a self-developed game-based Oral Swedish L2 learning system utilizing LLM-generated content. For LLM content generation evaluation, we assessed the learning content generated by three state-of-the-art models (GPT-4, Claude, LLaMA 3.3-70B-Instruct) across 10 topics and 4 CEFR levels, totaling 360 sets, with a multi-dimensional framework. The results revealed that while each model has distinct strengths—LLaMA excelling in learner experience and CEFR accuracy, Claude in technical integration, and ChatGPT in balanced performance—there are no considerate overall differences of performance. Machine-automated CEFR prediction shows that the exact accuracy rates are below 20% across all models with only 5% gap at the biggest, while adjacent accuracy ranges from 60% to 70%, indicating similar fundamental capabilities and limitations in CEFR-aligned content generation across all selected models. For system evaluation, the LLM-generated content was integrated into “LinGo Town”, a game-based Swedish L2 speaking learning system centering real-time speech assessment and adaptive difficulty features. Its evaluation framework incorporates pre-post assessments, in-game and post-game user questionnaires, and quantitative analysis were employed to evaluate system performance. Experimental results with 20 participants demonstrated measurable improvements in pronunciation-related skills, with small to medium effect sizes for speaking accuracy and fluency development. This research provides empirical evidence for the potential of adaptive L2 learning systems assisted with LLM-generated content, which is especially beneficial for lowresource language learning, such as L2 Swedish speaking, while identifying specific limitations and opportunities for improvement in CEFR-aligned content generation. Keywords: Swedish L2 learning, Large Language Models, CEFR-Aligned Content Generation, Game-based learning, Swedish Speech assessment, ICALL
Effect of prompt strategy on the results of Code Generation by LLMs
(2025-06-19) Wang, Yiyi; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Abstract Large Language Models (LLMs) have made significant strides in automated code generation. For example, Github Copilot based on the CodeX model, is the first to generate complete functions directly from natural language descriptions. However, their output quality remains highly dependent on prompt design. This study systematically investigates how different prompt strategies impact generated code from LLMs and explores optimization strategies for prompt engineering. We conducted experiments using Google Gemini with a single task employing four prompt strategies: zero-shot, few-shot with examples, Chain-of-Thought (CoT), and Persona-enhanced prompts. Our findings reveal that progressively enriching the prompt from zero-shot to few-shot, then integrating CoT and Persona can significantly improve the syntactic correctness of the generated code. Additionally, we utilize a code generation benchmark (MBPP) to evaluate the Gemini and DeepSeek-R1 model using the pass@3 metric. This experiment yielded an overall pass@3 score of approximately 70.60% and 79.4% separately. Moreover, we compare our result of accuracy from DeepSeek-R1 with the existing work using other LLMs such as ChatGPT. Our experiment result of DeepSeek-R1 with 86.8% accuracy performs near to ChatGPT Plus, which is 87.5%. Therefore, we conclude that DeepSeek-R1 is on the leading groups in the existing LLMs for code generation ability. In conclusion, our results show improvements in the syntactic correctness of the model generations. These results underscore the critical role of prompt strategy and structure in enhancing LLMs code generation performance, providing a solid theoretical and experimental foundation for future research on more complex programming tasks, multi-model comparisons, and large-scale evaluations.
Breaking Barriers: Enhancing Universal Dependency Parsing for Amharic Advancing NLP for A Low-Resource Language
(2025-06-19) Jembere, Dawit; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
This study advances Amharic dependency parsing by expanding and refining the existing Universal Dependencies (UD) Treebank (Seyoum, Miyao, and Mekonnen, 2018). As a morphologically rich and under-resourced language, Amharic poses unique challenges in natural language processing (NLP), particularly in syntactic and morphological parsing. Leveraging the UD framework and the transformer-based toolkit, Trankit, this work achieves improved parsing accuracy, outperforming the results obtained with UDPipe and Turku models by Seyoum, Miyao, and Mekonnen (2020) across multiple evaluation metrics. This result demonstrates that dataset augmentation, coupled with rigorous syntactic validation, can substantially enhance parsing performance and offer a scalable pathway for NLP development in lowresource languages.
Expert in the Loop: LLM Assistance for Technical Documentation Writing Case Study at Saab AB
(2025-06-13) Nieminen, Anni; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Abstract This study explores the potential of LLMs in the technical writing process at Saab Aeronautics. The technical writing process is investigated by interviewing technical writers, collecting insights regarding the most challenging tasks and areas where AI-assistance could be beneficial. These experts are involved in this research project in several stages with the aim of investigating how an LLM could facilitate their tasks. A demonstration dataset is collected with the help of the experts. Additionally, a parallel corpus consisting of technical procedures is created. Supervised Instruction Fine Tuning (SIFT) method is implemented for the fine-tuning of an LLM (Mistral-7b-Instruct-v.02), combining Quantized Low-rank Adaptation (QLoRA) and Low-rank Adaptation (LoRA) in order to perform the fine-tuning memory-efficiently. Sampled generations are investigated qualitatively in addition to a small-scale hyperparameter search. Both the experts involved in the data collection as well as held-out experts are involved in the evaluation stage. The results show that the fine-tuned model’s outputs are preferred over the base model outputs 68% of the time. Analysis of the experts’ comments reveals that the fine-tuned model outperforms the base model specifically in terms of adhering to the Simplified Technical English (STE) writing standard and by containing fewer hallucinations. This study suggests potential in fine-tuning LLMs with small, but high-quality datasets. Additionally, this study highlights the significance in involving human expertise in such processes for domain-specific needs, such as those at Saab.
Training for the Unexpected Approaching Universal Phone Recognition for Computer-Assisted IPA Transcription of Low-Resource Languages
(2025-06-13) Lee Suchardt, Jacob; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Abstract We set out to develop a language-agnostic ASR model for the phonetic transcription of speech into the International Phonetic Alphabet (IPA). While NLP and Automatic- Speech-Recognition (ASR) have made immense leaps in research and quality, most of the world’s languages are still excluded from this development. In the interest of aiding documentation and linguistic work with low-resource languages, we examine the possibility of universal Speech-to-IPA (STIPA) transcription by exploring the cross-lingual transfer of STIPA knowledge, as learnt from high-resource languages, to unseen and low-resource languages in zero-shot settings. Our specific goal is the application and evaluation of cross-lingual STIPA to the severely endangered language Sanna (also ”Cypriot Maronite Arabic”, described in e.g. Borg, 2011).
From Abstract Syntax to Natural Language Addressing Natural Language Generation Challenges in Arabic Using GFWordnet as Lexical Resources.
(2024-11-28) Zarzoura, Mohamed; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
This thesis explores the development and evaluation of Arabic natural language generation using the Grammatical Framework (GF) within GFPedia. GFPedia is a framework that generates multilingual content using predefined abstract syntax trees (ASTs) and dynamic placeholders for lexical entries from GFWordNet. The primary goal is to assess how effectively GF can generate grammatically correct sentences based on the available abstract syntax trees (ASTs) in the GFPedia. The research involves building Arabic lexical resources and integrating them into GFPedia. The system’s output is evaluated (a) automatically, using Levenshtein distance to measure deviations from reference texts and (b) manually by analyzing the grammatical and morphological correctness. Results highlight significant challenges in Arabic sentence generation, including issues with word structure, definiteness, syntactic alignment, and the need for context-aware translations. To address these challenges, the thesis proposes the introduction of a semantic layer into the GFPedia framework. By leveraging ontological and contextual information from resources like Wikidata, the semantic layer can select appropriate words, word order, sentence types, and other linguistic features based on the semantic content of the information. This approach aims to reduce the dependency on deep knowledge of the Resource Grammar Library (RGL) and language-specific grammar, facilitating a more efficient and scalable content development process. Additionally, the thesis suggests using Large Language Models (LLMs) to assist in generating lexical resources using Retrieval-Augmented Generation (RAG).
FINDING MEANING IN A HAYSTACK: On How Vision and Language Models Process Figurative Language
(2024-11-28) Filippatou, Viktoria; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Figurative language is an integral part of human communication and everyday life. As a Natural Language Processing task it has long been the focus of attention in research, and recently it has been translated into a vision and language task, where multi-modal models seem to outperform uni-modal ones. This thesis explores how a vision and language transformer-based model, specifically VisualBERT, understands figurative language -idioms, metaphors, and similes- and examines if its visual embeddings can be enhanced to align better with figurative meaning. Understanding these alignments is critical for assessing whether these models can truly grasp the abstract and symbolic layers of language, beyond surface-level pattern recognition. Through a series of experiments and attention analysis, this research highlights both the potential and limitations of a vision and language model, illuminating the broader challenges in grounding language to visual contexts.
IMPLEMENTING A GROUNDING MODULE FOR AN NPC Testing grounding of novel names with state charts and LLMs
(2024-11-12) Astaiza Soriano, Nayat; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
In this work we argue that any system that processes spoken human language should incorporate mechanisms for grounding, as it is an essential part of human communication. Because of this, we explore two name-grounding systems conceived as a Non-Playable Character (NPC) of a hypothetical video game stage. The methodology consists of a qualitative observation of the performance these two different dialogue systems carry out when encountered with an unusual name. These two systems present different structures: one of them is a rule based state-chart and the other a prompted Large Language Model (LLM). We have found that the first system can be improved in terms of flexibility and the second one in terms of reliability. After said observation we propose a hybrid approach for future research.
"MOVE TOWARDS THE BIG BLACK PIANO": HOW FINE-GRAINED FEATURES AFFECT THE GOAL OF NAVIGATION Improving salient landmark features in an end-to-end system
(2024-10-30) Le Glouanec, Bérénice; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Navigational instructions like ”Move towards the big black piano” or ”Head past the green armchair” are intuitive for humans, as they rely on salient landmarks to guide movement through space. This thesis explores how fine-grained features, such as spatial location, shape, and color, influence the salience of landmarks in navigation systems. Through linguistic analysis of textual descriptions and object recognition using Faster R-CNN implemented with a bottom-up attention mechanism, we captured key attributes that enhance the clarity of instructions. Our experiments were conducted using the Room-to-Room dataset (Anderson et al. (2018)), which provides human instructions for indoor navigation, and the Matterport3D environment (Chang et al. (2017)), offering egocentric visual data. By clustering nouns and attributes based on frequency and semantic similarity, we identified important objects and attributes that guide users efficiently. By examining object distribution in skyboxes and mapping instructions to visual scenes, we evaluated whether accessing multiple skybox views (top, back, left, front, right, and bottom) instead of a single, centered view provides additional contextual value in goal-oriented navigation systems. Finally, we extended previous research by applying a bi-directional boost attention mask over salient landmarks within Anderson et al. (2018)’s Seq2Seq LSTM model, where our experiments demonstrated significant improvements. Notably, the dynamic weights in the attention class achieved 37.65% and 22.22% success rates on seen and unseen data, outperforming the baseline. Therefore, by using linguistic salience to guide visual attention, we improve the navigation task and demonstrate how language refines the model’s focus. Future work should continue refining the attention mechanism and explore further strategies, such as integrating additional views, to provide even richer contextual information and further boost navigation accuracy.
WHEN EYES MEET LAUGHTER: Exploring Non-Verbal Cues in Human-Robot Interaction with Furhat
(2024-10-25) Giannitzi, Eleni; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Human-robot interaction is becoming increasingly popular, with social robots like Furhat playing key roles in enhancing communication through both verbal and non-verbal cues. This thesis investigates the impact of gaze and laughter coordination in human-robot interactions, focusing on how these non-verbal behaviours, aligned with each other, enhance metrics such as perceived naturalness, empathy, and human-likeness of the robot. The study builds upon existing research on non-verbal communication and further explores how laughter and gaze alignment can improve conversational flow and emotional engagement between humans and robots. Using Furhat, a social robot, experiments were conducted, involving a simulated cooking activity, where participants interacted with the robot through dialogue that integrated gaze and gaze-aligned laughter functions. For the study, participants were evenly divided into two experimental groups. Throughout the interaction, participants were recorded and later asked to complete a questionnaire to capture their perceptions and emotional state. The insights gathered from the experiments highlight interesting trends in both quantitative and qualitative aspects related to user experience. Participants who saw Furhat produce gaze and laughter behaviour in line with human behaviour rated Empathy, Naturalness and Authenticity, Naturalness of Laughter, and Compassion as higher than those who witnessed the same behaviours in inappropriate contexts. These results show promising potential for designing more human-like social robots capable of meaningful non-verbal communication. The thesis also addresses limitations that may guide future studies.
GENERATING ROUTE DESCRIPTIONS Automatic generation of route descriptions with visual elements from graph and salient landmarks
(2024-10-25) Akhavan, Kamaneh; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Can machines find the shortest route and guide us using intuitive, human-like instructions, such as ”move towards the big black piano” or ”head past the green armchair”? This thesis investigates the potential of machines to generate navigation instructions that combine the efficiency of graph-based systems with the clarity provided by salient landmarks. Our research focuses on creating a system that determines the shortest path between two points and enriches navigation with human-like, landmark-based descriptions. Integrating allocentric 1 and egocentric 2 perspectives, we aim to improve the quality and naturalness of the generated instructions. To extract salient landmarks, we utilized data from Bérénice Le Glouanec’s project(Le Glouanec (2024)3), which employed object recognition techniques, including a Faster R-CNN model. This approach allowed us to identify significant visual attributes such as spatial location, shape, and color of landmarks within the environment. We integrated these visually salient elements into our system to improve the clarity and relatability of the generated navigation instructions. The system was evaluated through measures of similarity between machine-generated and human-written instructions, yielding a mean cosine similarity score of 54 % and a Jaccard similarity score of 13 %. These results indicate a reasonable resemblance to human-generated navigation instructions, demonstrating the system’s potential. Future work will focus on expanding the dataset to include more diverse environments, such as outdoor spaces, and exploring customizable multimodal systems to enhance user experience and accessibility.
EMERGENCE OF REFERRING EXPRESSIONS THROUGH LANGUAGE GAMES
(2024-10-25) Künkele, Dominik; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
There has been a recent focus on how neural agents in language games ground referring expressions in visual 3D-scenes. This thesis explores when referring expressions emerge and if they align with referring expression found in natural languages like English. For this, multiple new artificial datasets based on the CLEVR dataset are generated to control precisely for the bias included in the visual scenes, namely the attributes of the target object and distractors. The datasets and their controlled biases are validated in a series of reference expression generation and comprehension tasks. A sender and a receiver are playing language games in which they need communicate referring expressions to solve the same tasks. For many tasks, they are able to successfully ground referring expressions in their own emerged language. An analysis of the emerged languages shows that the emerged referring expressions are aligned very few with natural language referring expressions. However, they share certain features like an incremental approach in which some attributes are consistently used more often than others
Fine-Tuning Large Language Models for Practical Software Engineering: Case Studies in Automated Patch Generation
(2024-10-21) Zhou, Jiayun; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
In recent years, software development has become increasingly complex, posing challenges in problem-solving, code optimization, and error correction. The rise of Artificial Intelligence (AI) and Large Language Models (LLMs) has introduced new opportunities to automate these tasks, revolutionizing code generation, understanding, and maintenance. This study investigates the fine-tuning of LLMs, particularly the DeepSeek Coder 6.7b model, using real business code data from Epiroc, a leading company in the mining and infrastructure industries. The objective is to improve the model’s ability to generate code patches that meet evolving business requirements. Fine-tuning strategies, including data preparation and optimization techniques, were applied to enhance the model’s accuracy, reliability, and adaptability. The results demonstrate significant improvements across multiple metrics, including correctness, maintainability, and efficiency, with the fine-tuned model outperforming the baseline in patch generation tasks. Challenges related to dataset complexity, long sequence processing, and resource constraints were addressed through data preprocessing and resource-efficient training methods. This research highlights the potential of LLMs in automating patch generation and improving programming efficiency, providing valuable insights and methodologies for future projects in AI-assisted software development. The findings lay the groundwork for further advancements in intelligent programming assistants, which promise to enhance the future of software engineering.
PERSONALIZED LANGUAGE LEARNING IN THE AGE OF AI. Leveraging Large Language Models for Optimal Learning Outcomes.
(2024-06-25) Zerkowska, Anika Milena; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
In a new era marked by technological advancements and the AI boom, language learning is no longer limited to the classrooms. The emergence of Large Language Models (LLMs) propels further advancements within language learning, as well as creates space where learners can engage in more personalized learning approaches, with the content dynamically adapted to their individual needs. The thesis conducts a series of experiments involving curated learner profiles with two prominent LLMs – ChatGPT and Gemini, to check whether the LLMs can be utilized in language learning and also to what extent LLMs can foster Personalized Language Learning (PLL) approaches. The experiments show that there is significant potential in implementing LLMs within language learning, and that the LLMs are capable of personalizing curriculum and teaching materials to accommodate diverse learner profiles. In addition, the thesis identifies potential drawbacks, risks, and ethical considerations associated with the integration of LLMs in PLL.
Creating Synthetic Dialogue Datasets for NLU Training. An Approach Using Large Language Models
(2024-06-20) Laszlo, Bogdan; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
This thesis explores the topic of using the GPT-4 large language model, to generate high-quality, diverse synthetic dialogue datasets for training Natural Language Understanding (NLU) models in task-oriented dialogue systems. By employing a schema-guided framework and prompt engineering, the study explores whether synthetic data can replace real-world data. The research focuses on domain classification, active intent classification, and slot multi-labelling. Results show that while synthetic datasets can moderately match real-world data, issues like quality and annotation inconsistency persist.
Automatic Idiomatic Expression Detection. Comparison Between GPT-4 and Gemini Pro Prompt Engineering & LSTM-RNN Construction
(2024-06-18) Hakkarainen, Stanislav; Engelbrecht, Katharina; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
This thesis explores the concept of detecting non-literal phrases using Large Language Models (LLM) such as GPT-4 and Gemini Pro, as well as Recurrent Neural Networks (RNN), LSTM and BiLSTM models in particular. Through a series of individual experiments and cross-validations, it was discovered that both LLMs demonstrated satisfactory capabilities in identifying idiomatic expressions with degrees of variance across sentences. Additionally, it was observed that Gemini Pro slightly outperformed GPT-4 in the separate validation based on precision and recall. Gemini Pro scores highest for testing on 95% of precision and 81% of recall. GPT-4 scores highest for precision at 87% and for recall at 88%. During cross-validation, however, GPT-4 improved whereas Gemini Pro’s precision became worse. GPT-4 scored 88% for precision and 90% for recall, whereas Gemini Pro became worse for precision, scoring 83%, however improved for recall scoring 95%. In terms of RNN, the BiLSTM-RNN outperforms the LSTM-RNN in the idiomatic detection task by a significant margin by scoring 95% in precision and 90% in recall compared to its counterpart achieving 79% in precision and 25% in recall, proving that a bidirectional approach is better suitable for working with sequential data such as idiomatic expressions. To summarize, it has been shown that specialized model architectures such as LSTM modules are preferable when working in the domain of idiomatic expression detection to general-purpose LLMs.
Logical properties of Natural Language Inference - Experiments with Synthetic Data to Study Consequence Relations in LSTMs
(2024-06-17) Monteiro, Hélder; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Natural language inference (NLI) datasets are great resources to train and benchmark models that infer entailment relations. However, these datasets are known to have issues such as lexical biases that affect the behaviour of the models trained on them. In this thesis, we take on this task from an experimentation point of view. We study consequence relations and how data augmentation affects the performance of NLI models. We started by defining the model, which uses a simple LSTM consisting of an embedding layer, and defined three scenarios upon which we synthesize entailment in a controlled manner from within SNLI corpus. We trained various models and compared the performance using the f1-score for entailment and overall accuracy to show how adding synthetic data provides a middle-ground to have balanced performance, particularly for different consequence relations. We found that under the scenarios we defined, self-entailment decreases the f1-score marginally compared to the original data when tested on the baseline model. This is followed by conjunction scenario where the premise is augmented with its hypothesis, and finally, where the hypothesis is augmented with the premise. We conclude by recommending proportions of synthetic data that should be added to these models to make them better at inferring different logical consequence relations.
EXPLORING LEXICAL SEMANTIC CHANGE IN POLISH USING XL-LEXEME
(2024-06-17) Slowinska, Ewa; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
The scope of this thesis is on Lexical Semantic Change (LSC) and its automatic detection in the Polish language. Following Cassotti et al. (2023)’s findings, the following thesis leverages XL-Lexeme, a transformerbased bi-encoder model, to perform LSC detection on the Polish Parliamentary Corpus divided into two time periods: (1) 1919-1961 and (2) 1989-2023. The aim of this thesis is to examine the performance of XL-Lexeme with a Polish dataset and to state what kind of changes occurred between the two predefined time periods. The results suggest a rather robust performance of XL-Lexeme, coinciding with the judgements of a native speaker of Polish, however the influence of context and occasional annotation errors hinder the reliability of the results. The types of changes detected through close-reading include semantic widening and narrowing as well as changes in the meaning distribution, which are often be related to technological and political advancements. Additional WiC task performed on a small portion of annotated sentence pairs further confirms XL-Lexeme’s swift handling of Polish language, yielding a precision as high as 0.971 but falling behind on recall which amounts to 0.684.
Don't Mention the Norm
(2024-06-17) Södahl Bladsjö, Tom; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Reporting bias (the human tendency to not mention obvious or redundant information) and social bias (societal attitudes toward specific demographic groups) have both been shown to propagate from human text data to language models trained on such data (Shwartz and Choi, 2020; Paik et al., 2021; Caliskan, Bryson, and Narayanan, 2017; Garg et al., 2018). However, the two phenomena have not previously been studied in combination. This thesis aims to begin to fill this gap by studying the interaction between social biases and reporting bias in both human text and language models. We conduct a corpus study of human-written text, and find that n-gram frequencies in our chosen corpora show strong signs of reporting bias with regard to socially marked identities, mirroring current discourse in society. This thesis also introduces the MARB dataset for measuring model reporting bias with regard to socially marked attributes. We evaluate ten large pretrained language models on MARB and analyze the results in relation to both corpus frequencies and real-world frequencies. The results suggest a relationship between reporting bias and social bias in language models similar to that which was identified in human text. However, this relationship is not as straightforward in language models, and other factors, like sequence length and model vocabulary, are also observed to affect the outcome.
MACHINE TRANSLATION FROM ANCIENT GREEK TO ENGLISH: EXPERIMENTS WITH OPENNMT
(2024-06-17) Kolovou, Ourania; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
The current thesis focuses on the application of neural machine translation (NMT) models translating from Ancient Greek to English. The rich morphology, syntax, and vocabulary of the Ancient Greek combined with its status as a low-resource language pair, lead to considerable challenges for translation. Specifically, this study seeks to address the following question: How can NMT models capture the richness and complexity of the source language? Pre-processing of a parallel corpus from Perseus Digital Library and Opus[32] \Tatoeba [33] is done followed by division into training, validation, and testing sets. Multiple NMT models were built using the OpenNMT [19] framework, primarily based on recurrent neural network (RNN) architectures. The top-performing model was an RNN-based model with one one-layer encoderdecoder and a “general” attention mechanism. Despite the modest scores in metrics, including a BLEU score of 8 and METEOR of 0.35, the model exposes limitations in capturing morphosyntactic, semantic, and pragmatic details, especially in longer sentences.

Browse

Recent Submissions