MULTI-CLASS GRAMMATICAL ERROR DETECTION Data, Benchmarks and Evaluation Metrics for the First Shared Task on Swedish L2 Data

Casademont Moner, Judit

Sammanfattning

Grammatical Error Detection (GED) is a challenging NLP task that has not received a lot of research attention in the recent years, especially in the Swedish language. However, in the world we live in, where there are more L2 (second language) learners than there have ever been, educational resources for students such as tools for grammar checking are needed. With this in mind, this Master’s thesis presents the generation process of the Swedish MuClaGED (Multi-Class Grammatical Error Detection) dataset, which is going to be part of a Computational SLA (Second Language Acquisition) shared task and it will likely be useful for the future production of multilingual grammatical error detection systems. Once Swedish MuClaGED is obtained in this thesis, two main experiments are performed on it to test its capabilities and obtain baseline results in preparation for the aforementioned shared task. Moreover, this project also aims to tackle and explore the advantages, disadvantages and functionalities of the creation of hybrid error detection datasets by experimenting on producing GED models trained on the combination of original L2 learners’ data with text corrupted with artificially generated syntactical errors.

Examinationsnivå

Student essay

Datum

2022-06-20

Författare

Casademont Moner, Judit

Nyckelord

Grammatical Error Detection, L2 Swedish dataset, synthetic data, shared task

Språk

eng

Metadata

Visa fullständig post