MULTI-CLASS GRAMMATICAL ERROR DETECTION Data, Benchmarks and Evaluation Metrics for the First Shared Task on Swedish L2 Data
Sammanfattning
Grammatical Error Detection (GED) is a challenging NLP task that has not received a lot of research attention
in the recent years, especially in the Swedish language. However, in the world we live in, where there
are more L2 (second language) learners than there have ever been, educational resources for students such
as tools for grammar checking are needed. With this in mind, this Master’s thesis presents the generation
process of the Swedish MuClaGED (Multi-Class Grammatical Error Detection) dataset, which is going to
be part of a Computational SLA (Second Language Acquisition) shared task and it will likely be useful for
the future production of multilingual grammatical error detection systems. Once Swedish MuClaGED is
obtained in this thesis, two main experiments are performed on it to test its capabilities and obtain baseline
results in preparation for the aforementioned shared task. Moreover, this project also aims to tackle and
explore the advantages, disadvantages and functionalities of the creation of hybrid error detection datasets
by experimenting on producing GED models trained on the combination of original L2 learners’ data with
text corrupted with artificially generated syntactical errors.
Examinationsnivå
Student essay
Fil(er)
Datum
2022-06-20Författare
Casademont Moner, Judit
Nyckelord
Grammatical Error Detection, L2 Swedish dataset, synthetic data, shared task
Språk
eng