An overview of Grammatical Error Correction for the twelve MultiGEC-2025 languages
No Thumbnail Available
Date
2025-01-31
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This overview is complementary to the comprehensive dataset description article for MultiGEC – a dataset for Multilingual Grammatical Error Correction including data for twelve European languages: Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian.
It is well-known that in the field of Natural Language Processing (NLP) most publications tend to focus on the English language. While this is due to historical reasons (ease of publication, greater outreach, increased number of citations, etc.), it does leave other languages at a disadvantage across multiple tasks. The MultiGEC dataset was created as an attempt to counteract this effect. This report provides a historical overview of the evolution of GEC for each of the twelve languages in this dataset and provides a context for the work on the dataset and the related MultiGEC-2025 shared task.
Description
Keywords
Grammatical Error Correction, Language Technology, Natural Language Processing, shared task, MultiGEC-2025, Computational SLA