MULTI-CLASS GRAMMATICAL ERROR DETECTION Data, Benchmarks and Evaluation Metrics for the First Shared Task on Swedish L2 Data

No Thumbnail Available

Date

2022-06-20

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Grammatical Error Detection (GED) is a challenging NLP task that has not received a lot of research attention in the recent years, especially in the Swedish language. However, in the world we live in, where there are more L2 (second language) learners than there have ever been, educational resources for students such as tools for grammar checking are needed. With this in mind, this Master’s thesis presents the generation process of the Swedish MuClaGED (Multi-Class Grammatical Error Detection) dataset, which is going to be part of a Computational SLA (Second Language Acquisition) shared task and it will likely be useful for the future production of multilingual grammatical error detection systems. Once Swedish MuClaGED is obtained in this thesis, two main experiments are performed on it to test its capabilities and obtain baseline results in preparation for the aforementioned shared task. Moreover, this project also aims to tackle and explore the advantages, disadvantages and functionalities of the creation of hybrid error detection datasets by experimenting on producing GED models trained on the combination of original L2 learners’ data with text corrupted with artificially generated syntactical errors.

Description

Keywords

Grammatical Error Detection, L2 Swedish dataset, synthetic data, shared task

Citation