Creating Synthetic Dialogue Datasets for NLU Training. An Approach Using Large Language Models

dc.contributor.authorLaszlo, Bogdan
dc.contributor.departmentUniversity of Gothenburg / Department of Philosophy,Lingustics and Theory of Scienceeng
dc.contributor.departmentGöteborgs universitet / Institutionen för filosofi, lingvistik och vetenskapsteoriswe
dc.date.accessioned2024-06-20T17:06:07Z
dc.date.available2024-06-20T17:06:07Z
dc.date.issued2024-06-20
dc.description.abstractThis thesis explores the topic of using the GPT-4 large language model, to generate high-quality, diverse synthetic dialogue datasets for training Natural Language Understanding (NLU) models in task-oriented dialogue systems. By employing a schema-guided framework and prompt engineering, the study explores whether synthetic data can replace real-world data. The research focuses on domain classification, active intent classification, and slot multi-labelling. Results show that while synthetic datasets can moderately match real-world data, issues like quality and annotation inconsistency persist.sv
dc.identifier.urihttps://hdl.handle.net/2077/81885
dc.language.isoengsv
dc.setspec.uppsokHumanitiesTheology
dc.subjectLanguage Technologysv
dc.titleCreating Synthetic Dialogue Datasets for NLU Training. An Approach Using Large Language Modelssv
dc.title.alternativeCreating Synthetic Dialogue Datasets for NLU Training. An Approach Using Large Language Modelssv
dc.typeText
dc.type.degreeStudent essay
dc.type.uppsokH2

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Creating Synthetic Dialogue Datasets for NLU Training_Revised.pdf
Size:
1.06 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.68 KB
Format:
Item-specific license agreed upon to submission
Description: