Creating Synthetic Dialogue Datasets for NLU Training. An Approach Using Large Language Models

dc.contributor.author	Laszlo, Bogdan
dc.contributor.department	University of Gothenburg / Department of Philosophy,Lingustics and Theory of Science	eng
dc.contributor.department	Göteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteori	swe
dc.date.accessioned	2024-06-20T17:06:07Z
dc.date.available	2024-06-20T17:06:07Z
dc.date.issued	2024-06-20
dc.description.abstract	This thesis explores the topic of using the GPT-4 large language model, to generate high-quality, diverse synthetic dialogue datasets for training Natural Language Understanding (NLU) models in task-oriented dialogue systems. By employing a schema-guided framework and prompt engineering, the study explores whether synthetic data can replace real-world data. The research focuses on domain classification, active intent classification, and slot multi-labelling. Results show that while synthetic datasets can moderately match real-world data, issues like quality and annotation inconsistency persist.	sv
dc.identifier.uri	https://hdl.handle.net/2077/81885
dc.language.iso	eng	sv
dc.setspec.uppsok	HumanitiesTheology
dc.subject	Language Technology	sv
dc.title	Creating Synthetic Dialogue Datasets for NLU Training. An Approach Using Large Language Models	sv
dc.title.alternative	Creating Synthetic Dialogue Datasets for NLU Training. An Approach Using Large Language Models	sv
dc.type	Text
dc.type.degree	Student essay
dc.type.uppsok	H2

Files

Now showing 1 - 1 of 1

Now showing 1 - 1 of 1