Masteruppsatser / Master in Language Technology

Permanent URI for this collectionhttps://gupea-staging.ub.gu.se/handle/2077/61848

Browse

Now showing 1 - 20 of 44

Adaptive Game-Based Swedish Language Learning A Hybrid AI Approach to Content Generation
(2025-09-25) Geng, Tianyi; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Abstract This thesis tests the performance of LLMs on pre-generated L2 Swedish learning content across beginner to intermediate CEFR levels (A1–B2) by employing them into a self-developed language learning game (including language learning and self-validated language assessments). This study addresses a significant research gap in Swedish L2 speaking assessment and learning by providing two systematic evaluation components (with respectively designed frameworks): first, of LLM capabilities of CEFR-aligned Swedish content generation, and second, of a self-developed game-based Oral Swedish L2 learning system utilizing LLM-generated content. For LLM content generation evaluation, we assessed the learning content generated by three state-of-the-art models (GPT-4, Claude, LLaMA 3.3-70B-Instruct) across 10 topics and 4 CEFR levels, totaling 360 sets, with a multi-dimensional framework. The results revealed that while each model has distinct strengths—LLaMA excelling in learner experience and CEFR accuracy, Claude in technical integration, and ChatGPT in balanced performance—there are no considerate overall differences of performance. Machine-automated CEFR prediction shows that the exact accuracy rates are below 20% across all models with only 5% gap at the biggest, while adjacent accuracy ranges from 60% to 70%, indicating similar fundamental capabilities and limitations in CEFR-aligned content generation across all selected models. For system evaluation, the LLM-generated content was integrated into “LinGo Town”, a game-based Swedish L2 speaking learning system centering real-time speech assessment and adaptive difficulty features. Its evaluation framework incorporates pre-post assessments, in-game and post-game user questionnaires, and quantitative analysis were employed to evaluate system performance. Experimental results with 20 participants demonstrated measurable improvements in pronunciation-related skills, with small to medium effect sizes for speaking accuracy and fluency development. This research provides empirical evidence for the potential of adaptive L2 learning systems assisted with LLM-generated content, which is especially beneficial for lowresource language learning, such as L2 Swedish speaking, while identifying specific limitations and opportunities for improvement in CEFR-aligned content generation. Keywords: Swedish L2 learning, Large Language Models, CEFR-Aligned Content Generation, Game-based learning, Swedish Speech assessment, ICALL
An Experimental Evaluation of Grounding Strategies for Conversational Agents
(2020-09-11) Zou, Yiqian; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
With the continuous development of technology, dialogue system’s technology penetrates into human’s life. Grounding also becomes more and more important for dialogue systems. It is important to choose a suitable grounding strategy in a conversational agent. Two grounding strategies are compared in this article, explicit feedback and implicit feedback. The explicit feedback in this article is different from interrogative explicit feedback. It has been modified to make a system says ”Ok, x” in response to utterance x. The aim of this paper is to compare two grounding strategies and to find out which one is better. Additionally, how users respond to false feedback is also the research question in this article. In order to draw a conclusion, a dialogue system was implemented. This article uses a mix of quantitative method and qualitative method. Questionnaires are used to investigate the subjective judgments of participants. Participants evaluated the dialogue system through questionnaires. In the questionnaire, users rate the system from two aspects, naturalness and ease. From June 8th to 14th, the system was officially available. The data were analyzed by t-test and the result was presented in this article with diagrams. Most participants mentioned that they prefer the system with explicit feedback in the evaluation. According to the average score, the system with explicit feedback in this paper is more natural and easier to communicate than the system with implicit feedback. However, there is no significant difference between these two grounding strategies according to the results of the T-test. This does not mean that there are no differences, but that such differences may not be obvious because of the little sample size. In addition, user’s response to the wrong feedback is summarized in this article. Four kinds of reactions are described in this article, hesitation, repetition, point out the wrong feedback and correction.
Automatic Idiomatic Expression Detection. Comparison Between GPT-4 and Gemini Pro Prompt Engineering & LSTM-RNN Construction
(2024-06-18) Hakkarainen, Stanislav; Engelbrecht, Katharina; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
This thesis explores the concept of detecting non-literal phrases using Large Language Models (LLM) such as GPT-4 and Gemini Pro, as well as Recurrent Neural Networks (RNN), LSTM and BiLSTM models in particular. Through a series of individual experiments and cross-validations, it was discovered that both LLMs demonstrated satisfactory capabilities in identifying idiomatic expressions with degrees of variance across sentences. Additionally, it was observed that Gemini Pro slightly outperformed GPT-4 in the separate validation based on precision and recall. Gemini Pro scores highest for testing on 95% of precision and 81% of recall. GPT-4 scores highest for precision at 87% and for recall at 88%. During cross-validation, however, GPT-4 improved whereas Gemini Pro’s precision became worse. GPT-4 scored 88% for precision and 90% for recall, whereas Gemini Pro became worse for precision, scoring 83%, however improved for recall scoring 95%. In terms of RNN, the BiLSTM-RNN outperforms the LSTM-RNN in the idiomatic detection task by a significant margin by scoring 95% in precision and 90% in recall compared to its counterpart achieving 79% in precision and 25% in recall, proving that a bidirectional approach is better suitable for working with sequential data such as idiomatic expressions. To summarize, it has been shown that specialized model architectures such as LSTM modules are preferable when working in the domain of idiomatic expression detection to general-purpose LLMs.
Breaking Barriers: Enhancing Universal Dependency Parsing for Amharic Advancing NLP for A Low-Resource Language
(2025-06-19) Jembere, Dawit; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
This study advances Amharic dependency parsing by expanding and refining the existing Universal Dependencies (UD) Treebank (Seyoum, Miyao, and Mekonnen, 2018). As a morphologically rich and under-resourced language, Amharic poses unique challenges in natural language processing (NLP), particularly in syntactic and morphological parsing. Leveraging the UD framework and the transformer-based toolkit, Trankit, this work achieves improved parsing accuracy, outperforming the results obtained with UDPipe and Turku models by Seyoum, Miyao, and Mekonnen (2020) across multiple evaluation metrics. This result demonstrates that dataset augmentation, coupled with rigorous syntactic validation, can substantially enhance parsing performance and offer a scalable pathway for NLP development in lowresource languages.
CORPUS EXPLORATION AND DIALOGUE SYSTEM DESIGN FOR A VIRTUAL LIBRARIAN
(2020-09-01) Li, Xiao; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
This thesis is a part of the virtual librarian project for the City Library Gothenburg (Stadsbibliotek Göteborg), which is a public city library. The objective of the project is to develop a virtual librarian using machine learning and AI approaches to replace the current webchat solution to reduce the workload of human librarians and increase satisfaction among the patrons. This thesis offers a systematic approach for the development practice based on small existing corpora for small and middle-size institutions, in which resources, especially technical development resources, are limited. The methods take the workload off from the side of the principal1 significantly, using requirement analysis with a narrative interview; topic-session based annotation with expandable tag set without detailed annotation guidelines, which requires less linguistic pre-knowledge and training process; and intent identification through corpus analysis with the assignment of priorities. Furthermore, this thesis offers a classification of intents based on the patterns of system behavior, which simplifies the formation of a complete intent list. Since Rasa is the preliminarily prioritized platform for the implementation of the virtual librarian, this thesis also engages a short competitive product analysis of the dialogue systems in the Rasa showcase. In the end, some technical suggestions for Rasa implementation are given, reflecting the requirements from the City Library Gothenburg.
Creating Synthetic Dialogue Datasets for NLU Training. An Approach Using Large Language Models
(2024-06-20) Laszlo, Bogdan; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
This thesis explores the topic of using the GPT-4 large language model, to generate high-quality, diverse synthetic dialogue datasets for training Natural Language Understanding (NLU) models in task-oriented dialogue systems. By employing a schema-guided framework and prompt engineering, the study explores whether synthetic data can replace real-world data. The research focuses on domain classification, active intent classification, and slot multi-labelling. Results show that while synthetic datasets can moderately match real-world data, issues like quality and annotation inconsistency persist.
Determining linguistic predictor for the classification of subjective cognitive impairment and mild cognitive impairment using machine learning
(2020-09-01) Wang, Tian; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Introduction Mild Cognitive Impairment (MCI) is a neurological condition characterized by cognitive decline greater than expected for an individual's age and education level. Subjective Cognitive Impairment (SCI) is a selfreported decline in cognitive abilities but not clinically identified as MCI. Individuals with MCI remain functional in their daily activities (Petersen et al., 1999) and are characterized by different deterioration rates depending on the evaluation methods employed. More than 50% of these individuals will develop Alzheimer’s Disease (AD) within the following five years; however several will remain stable and never develop AD (Gauthier et al., 2006; Petersen et al., 1999; Petersen et al., 2017). Although, there is no cure for AD, the early identification of individuals with MCI can enable treatments to delay the progression of the condition (Zucchella et al., 2018). Therefore, it is of paramount importance, to develop reliable objective diagnostic methods of cognitive impairment that can be conducted at primary care centers and memory clinics to determine whether an individual should seek further professional advice. Methodology 90 individuals participated in the study. 23 SCI patients, 31 MCI patients and 36 healthy controls (HC) enrolled in the study. All participants were between 50 to 79 years old; had Swedish as their first and only language before 5 years old; had similar length of education; had no stroke or brain tumor; and had recent neuropsychological test results available for assessment. Connected speech data were elicited from cookietheft picture description task (Goodglass & Kaplan, 1983), a standardized test employed in language therapy and evaluation sessions. Participants were recorded and the recordings were manually transcribed into text. The study refined the transcriptions of the recordings, defined several linguistic features, and employed two different annotation tools (Sparv and Parsey Universal) and two statistical measurements (Accuracy and Area under the Receiver Operating Characteristic (ROC AUC)) to select the superior feature set for the classification tasks. As a side product, an open source Swedish text annotation tool was deployed to benefit the linguistic research community. A novel feature engineering approach called SVCRandomized Recursive Feature Elimination (SVC-RRFE) was introduced to select best features using Support Vector Machine, binary search and group k-fold cross validation. In the end, the 160, 150 and 98 selected features were applied and evaluated in feed-forward neural networks using group 10-fold crossvalidation. Results Through group 10-fold cross validation neural networks (NN), we reached 76% mean accuracy, 73% mean ROC AUC, 0.47 mean Matthew’s correlation coefficient for MCI detection; 71% mean accuracy, 71% mean ROC AUC, 0.4 mean Matthew’s correlation coefficient for SCI detection and 75% mean accuracy, 71% mean ROC AUC, 0.39 mean Matthew’s correlation coefficient to differentiate MCI speakers and SCI speakers. The highest validation accuracy for the three models were 83%, 79% and 84%, respectively. The best features to classify MCI individuals and HC were mean length of word, words begin with [mɐ] and words with [ɪp] at the second and third position; the top 3 most important features to identify SCI individuals and HC were words with [ɑːɡ] at the second and third position, words begin with [mɑː] and words begin with [jøː]; and MLU, words begin with [dɛ], words with [ɑːd] at the second and third position were the most important features to differentiate MCI and SCI individuals. 3 Discussions Phonology was impaired in patients with MCI and SCI subject. Specifically, Individuals with MCI showed more self-interruptions, produced more long vowels than the ones with SCI, more unrounded vowels than rounded ones and more stops follow by back vowels during the picture description task. Individuals with SCI tend to produce longer utterances than HC and MCI ones, and more nasal consonants follow by close front vowels. Sparv annotated data performed better during feature selection and the ones analyzed by Parsey Universal reached better results with neural networks. It proved that feed-forward neural networks can be used to build models to identify people with MCI and people with SCI. By employing phonological features this study provided improved classification of individuals with MCI, provided added objective markers than can be employed to identify these individuals for treatment.
DIALOGUE STRATEGIES FOR VOCABULARY LEARNING User Initiative in Dialogue Systems for Second Language Learning
(2022-10-06) Carrión del Fresno, Andrea; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
When building efficient dialogue systems, a major challenge is recovering from miscommunication. Analyzing human-human interaction leads to discovering repair strategies that contribute towards conversational systems able to communicate in a natural and effective way. This thesis aims to identify recurring dialogue strategies (conversational patterns) commonly used among second language (L2) learners when acquiring new vocabulary by means of analyzing second language learner corpora. We further provide a simple theoretical model along with an implementation thereof capable of reproducing the most frequent patterns observed in our data and later embedded in a vocabulary training activity designed for the second language classroom. We found instances of production problems and code-switching taking place together caused by a poor linguistic competence in the target language. We showed that learners ask (either explicitly or implicitly) for the L2 word/expression they need and, once it is provided, learners repeat it as part of the strategy for acquiring new L2 vocabulary. We believe the findings of this thesis can be of value to dialogue systems for second language learning. Future work includes an extended implementation and exploring larger amounts of data.
Don't Mention the Norm
(2024-06-17) Södahl Bladsjö, Tom; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Reporting bias (the human tendency to not mention obvious or redundant information) and social bias (societal attitudes toward specific demographic groups) have both been shown to propagate from human text data to language models trained on such data (Shwartz and Choi, 2020; Paik et al., 2021; Caliskan, Bryson, and Narayanan, 2017; Garg et al., 2018). However, the two phenomena have not previously been studied in combination. This thesis aims to begin to fill this gap by studying the interaction between social biases and reporting bias in both human text and language models. We conduct a corpus study of human-written text, and find that n-gram frequencies in our chosen corpora show strong signs of reporting bias with regard to socially marked identities, mirroring current discourse in society. This thesis also introduces the MARB dataset for measuring model reporting bias with regard to socially marked attributes. We evaluate ten large pretrained language models on MARB and analyze the results in relation to both corpus frequencies and real-world frequencies. The results suggest a relationship between reporting bias and social bias in language models similar to that which was identified in human text. However, this relationship is not as straightforward in language models, and other factors, like sequence length and model vocabulary, are also observed to affect the outcome.
Effect of prompt strategy on the results of Code Generation by LLMs
(2025-06-19) Wang, Yiyi; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Abstract Large Language Models (LLMs) have made significant strides in automated code generation. For example, Github Copilot based on the CodeX model, is the first to generate complete functions directly from natural language descriptions. However, their output quality remains highly dependent on prompt design. This study systematically investigates how different prompt strategies impact generated code from LLMs and explores optimization strategies for prompt engineering. We conducted experiments using Google Gemini with a single task employing four prompt strategies: zero-shot, few-shot with examples, Chain-of-Thought (CoT), and Persona-enhanced prompts. Our findings reveal that progressively enriching the prompt from zero-shot to few-shot, then integrating CoT and Persona can significantly improve the syntactic correctness of the generated code. Additionally, we utilize a code generation benchmark (MBPP) to evaluate the Gemini and DeepSeek-R1 model using the pass@3 metric. This experiment yielded an overall pass@3 score of approximately 70.60% and 79.4% separately. Moreover, we compare our result of accuracy from DeepSeek-R1 with the existing work using other LLMs such as ChatGPT. Our experiment result of DeepSeek-R1 with 86.8% accuracy performs near to ChatGPT Plus, which is 87.5%. Therefore, we conclude that DeepSeek-R1 is on the leading groups in the existing LLMs for code generation ability. In conclusion, our results show improvements in the syntactic correctness of the model generations. These results underscore the critical role of prompt strategy and structure in enhancing LLMs code generation performance, providing a solid theoretical and experimental foundation for future research on more complex programming tasks, multi-model comparisons, and large-scale evaluations.
EMBODIED QUESTION ANSWERING IN ROBOTIC ENVIRONMENT Automatic generation of a synthetic question-answer data-set
(2021-11-12) Aruqi, Ali; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Embodied question answering is the task of asking a robot about objects in a 3D environment. The robot has to navigate the environment, find the entities in question, and then stop to answer the question. The answering system consists of navigation and visual-question-answering components. The agent is trained on a synthetic data-set of question-answers and navigational paths called EQA-MP3D. Each question in the data-set is an executable function that could be run in the environment to yield an answer. EQA-MP3D includes only two types of questions, color and location questions. The type of questions asked could be considered unnatural, and we observe that the question-answers contain biases. Our work extends the data-set by automatically generating size and spatial questions. We generate a total of 19 207 question-answers for training and 3 186 question-answers for validation. Our data extension is intended to train the system to answer more question types and enhance the system’s overall ability to perform the task.
EMERGENCE OF REFERRING EXPRESSIONS THROUGH LANGUAGE GAMES
(2024-10-25) Künkele, Dominik; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
There has been a recent focus on how neural agents in language games ground referring expressions in visual 3D-scenes. This thesis explores when referring expressions emerge and if they align with referring expression found in natural languages like English. For this, multiple new artificial datasets based on the CLEVR dataset are generated to control precisely for the bias included in the visual scenes, namely the attributes of the target object and distractors. The datasets and their controlled biases are validated in a series of reference expression generation and comprehension tasks. A sender and a receiver are playing language games in which they need communicate referring expressions to solve the same tasks. For many tasks, they are able to successfully ground referring expressions in their own emerged language. An analysis of the emerged languages shows that the emerged referring expressions are aligned very few with natural language referring expressions. However, they share certain features like an incremental approach in which some attributes are consistently used more often than others
EVALUATING CONFIDENCE ESTIMATION IN NLU FOR DIALOGUE SYSTEMS
(2022-06-20) Khojah, Ranim; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Background: Natural Language Understanding (NLU) is an important component in Dialogue Systems (DS) which makes the utterances of humans understandable by machines. A central aspect of NLU is intent classification. In intent classification, an NLU receives a user utterance, and outputs a list of N ranked hypotheses (an N-best list) of the predicted intent along with a confidence estimation (a real number between 0 and 1) that is assigned to each hypothesis. Objectives: In this study, we perform an in-depth evaluation of the confidence estimation of 5 NLUs, namely Watson Assistant, Language Understanding Intelligent Service (LUIS), Snips.ai and Rasa in two different configurations (Sklearn and DIET). We measure the calibration on two levels: rank level (results for specific ranks) and model level (aggregated results across ranks), as well as the performance on a model level. Calibration here refers to the relation between confidence estimates and true likelihood, i.e. how useful the confidence estimate associated with a certain hypothesis is for assessing its likelihood of being correct. Methodology: We conduct an exploratory case study on the NLUs. We train the NLUs using a subset of a multi-domain dataset proposed by Liu et al. (2021) on intent classification tasks. We assess the calibration of the NLUs on model- and rank levels using reliability diagrams and correlation coefficient with respect to instance-level accuracy, while we measure the performance through accuracy and F1-score. Results: The evaluation results show that on a model level, the best calibrated NLU is Rasa-Sklearn and the least calibrated NLU is Snips, while Watson surpasses other NLUs as the best performing NLU and Rasa-Sklearn as the worst performing NLU. The rank-level results resonate with the model-level results. However, on lower ranks, some measures become less informative due to low variation of the confidence estimates. Conclusion: Our findings convey that when choosing an NLU for a dialogue system, there is a trade-off between calibration and performance, that is, a well-performing NLU is not necessarily well-calibrated, and vice versa. While the chosen metrics of calibration is clearly useful, we also note some limitations and conclude that further investigation is needed to find the optimal metric of calibration. Also, it should be noted that to some extent, our results rest on the assumption that the chosen metrics of calibration is suitable for our purposes.
Evaluating the contribution of framenet to gender-based violence identification - How semantic annotation can be used as a resource for identifying patterns of violence
(2024-06-17) Vicente Dutra, Lívia; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
According to the World Health Organization, one in three women has been a victim of physical or sexual violence by their partner at some point in their lives. This indicates that Gender-Based Violence is a global public health concern. In Brazil, the scenario is not different. Although healthcare professionals are required to report violence cases, underreporting is still a challenge where Gender-Based Violence (GBV) is concerned. Different factors influence this, such as the fear of the victims, health professionals’ difficulties in identifying episodes of violence, and the lack of support tools for the health teams. When we also consider the lack of integration between the public information systems in Brazil, the difficulty in tackling the problem only increases. Considering that, a collaborative initiative between FrameNet Brasil and Vital Strategies Brasil launched the project “Data Linkage and Frame-Based Textual Analysis for the Identification of Candidate Cases of Gender-Based Violence in Territories”. The goal of the project is to develop tools for early warning and intervention, offering enhanced support for health teams, local authorities, and policymakers by employing linguistic analysis methods to read and map patterns within the open-text fields of electronic medical records completed in health units. Developed within this project, the main goal of this master’s thesis is to examine whether semantic annotation according to the FrameNet methodology can contribute with sufficient information to enhance the identification of potential cases and patterns of gender-based violence. To do that, a quantitative and qualitative evaluation of the outcome of the Data-Linkage project was performed, which involved a comparative assessment of an SVM model’s performance trained with: (1) data from the open-text fields which were annotated manually and automatically for frames and frame elements, (2) the data in (1) augmented with annotated parameterized data and (3) parameterized data only, without any annotation. Additionally, the qualitative evaluation also assessed the annotation process for both manual and automatic approaches. Following that, we were able to answer our research question and corroborate the hypothesis that the application of FrameNet methodology can help identify patterns and cases of violence. The quantitative assessment showed that the semantic models had over 0.3 of leverage on the F1 score compared to the categorical model. Our qualitative analysis validated the methods employed, suggested improvements, and indicated possible patterns to be further studied in future work.
EVALUATING THE EXTENT OF ETHNIC BIASES IN FINBERT AND EXPLORING DEBIASING TECHNIQUES
(2022-10-07) Suvanto, Minerva; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Language models are becoming increasingly popular. These models can contain social biases about various groups of people in them. The reproduction of biased beliefs can have harmful impacts on the groups they are about. We explore the extent of ethnic biases in the Finnish language model FinBERT. Our work focuses on biases about minority groups in Finland and we evaluate the extent of biases in the ethnic groups Roma, Finnish-Swedish, Sámi, Somali and Russian. In order to quantify the extent of biases, we use a template-based approach of calculating association scores between ethnicities and biased terms. We find that the model produces biased outcomes about the minority groups Roma and Somali. In order to mitigate the detected biases, we attempt debiasing FinBERT using dropout regularization and self-debiasing. The results of these two debiasing techniques do not produce satisfactory results and we conclude that debiasing ethnic biases and Finnish language models requires further research.
Expert in the Loop: LLM Assistance for Technical Documentation Writing Case Study at Saab AB
(2025-06-13) Nieminen, Anni; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Abstract This study explores the potential of LLMs in the technical writing process at Saab Aeronautics. The technical writing process is investigated by interviewing technical writers, collecting insights regarding the most challenging tasks and areas where AI-assistance could be beneficial. These experts are involved in this research project in several stages with the aim of investigating how an LLM could facilitate their tasks. A demonstration dataset is collected with the help of the experts. Additionally, a parallel corpus consisting of technical procedures is created. Supervised Instruction Fine Tuning (SIFT) method is implemented for the fine-tuning of an LLM (Mistral-7b-Instruct-v.02), combining Quantized Low-rank Adaptation (QLoRA) and Low-rank Adaptation (LoRA) in order to perform the fine-tuning memory-efficiently. Sampled generations are investigated qualitatively in addition to a small-scale hyperparameter search. Both the experts involved in the data collection as well as held-out experts are involved in the evaluation stage. The results show that the fine-tuned model’s outputs are preferred over the base model outputs 68% of the time. Analysis of the experts’ comments reveals that the fine-tuned model outperforms the base model specifically in terms of adhering to the Simplified Technical English (STE) writing standard and by containing fewer hallucinations. This study suggests potential in fine-tuning LLMs with small, but high-quality datasets. Additionally, this study highlights the significance in involving human expertise in such processes for domain-specific needs, such as those at Saab.
EXPLORING LEXICAL SEMANTIC CHANGE IN POLISH USING XL-LEXEME
(2024-06-17) Slowinska, Ewa; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
The scope of this thesis is on Lexical Semantic Change (LSC) and its automatic detection in the Polish language. Following Cassotti et al. (2023)’s findings, the following thesis leverages XL-Lexeme, a transformerbased bi-encoder model, to perform LSC detection on the Polish Parliamentary Corpus divided into two time periods: (1) 1919-1961 and (2) 1989-2023. The aim of this thesis is to examine the performance of XL-Lexeme with a Polish dataset and to state what kind of changes occurred between the two predefined time periods. The results suggest a rather robust performance of XL-Lexeme, coinciding with the judgements of a native speaker of Polish, however the influence of context and occasional annotation errors hinder the reliability of the results. The types of changes detected through close-reading include semantic widening and narrowing as well as changes in the meaning distribution, which are often be related to technological and political advancements. Additional WiC task performed on a small portion of annotated sentence pairs further confirms XL-Lexeme’s swift handling of Polish language, yielding a precision as high as 0.971 but falling behind on recall which amounts to 0.684.
Fast visual grounding in interaction
(2019-10-04) Cano Santín, José Miguel; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
A big challenge for the development situated agents is that they need to be capable of grounding real objects of their enviroment to representations with semantic meaning, so they can be comunicated to human agents using the human language. de Graaf (2016) developed the KILLE framework, which is a static camerabased robot capable of learning objects and spatial relations from very few samples using image processing algorithms suitable for learning from few samples. However, this framework has a major shortcoming: the time needed to recognise an object increased greatly as the system learned more objects, which motivates us to design a more efficient object recognition module. The following project researches a way to improve object recognition of the same robot framework using a neural network approach suitable for learning from very few image samples: Matching Networks (Vinyals et al., 2016). Our work also investigates how transfer learning form large datasets could be used to improve the object recognition performance and to make learning faster, which are very important features for a robot that interacts online with humans. Therefore, we evaluate the performance of our situated agent with transfer learning from pre-trained models and different conversational strategies with a human tutor. Results show that the robot system is capable of training models really fast and gets very good object recognition performance for small domains.
FINDING MEANING IN A HAYSTACK: On How Vision and Language Models Process Figurative Language
(2024-11-28) Filippatou, Viktoria; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
Figurative language is an integral part of human communication and everyday life. As a Natural Language Processing task it has long been the focus of attention in research, and recently it has been translated into a vision and language task, where multi-modal models seem to outperform uni-modal ones. This thesis explores how a vision and language transformer-based model, specifically VisualBERT, understands figurative language -idioms, metaphors, and similes- and examines if its visual embeddings can be enhanced to align better with figurative meaning. Understanding these alignments is critical for assessing whether these models can truly grasp the abstract and symbolic layers of language, beyond surface-level pattern recognition. Through a series of experiments and attention analysis, this research highlights both the potential and limitations of a vision and language model, illuminating the broader challenges in grounding language to visual contexts.
Fine-Tuning Large Language Models for Practical Software Engineering: Case Studies in Automated Patch Generation
(2024-10-21) Zhou, Jiayun; University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science; Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori
In recent years, software development has become increasingly complex, posing challenges in problem-solving, code optimization, and error correction. The rise of Artificial Intelligence (AI) and Large Language Models (LLMs) has introduced new opportunities to automate these tasks, revolutionizing code generation, understanding, and maintenance. This study investigates the fine-tuning of LLMs, particularly the DeepSeek Coder 6.7b model, using real business code data from Epiroc, a leading company in the mining and infrastructure industries. The objective is to improve the model’s ability to generate code patches that meet evolving business requirements. Fine-tuning strategies, including data preparation and optimization techniques, were applied to enhance the model’s accuracy, reliability, and adaptability. The results demonstrate significant improvements across multiple metrics, including correctness, maintainability, and efficiency, with the fine-tuned model outperforming the baseline in patch generation tasks. Challenges related to dataset complexity, long sequence processing, and resource constraints were addressed through data preprocessing and resource-efficient training methods. This research highlights the potential of LLMs in automating patch generation and improving programming efficiency, providing valuable insights and methodologies for future projects in AI-assisted software development. The findings lay the groundwork for further advancements in intelligent programming assistants, which promise to enhance the future of software engineering.

Browse

Browsing Masteruppsatser / Master in Language Technology by Author "Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori"