An overview of Grammatical Error Correction for the twelve MultiGEC-2025 languages

This overview is complementary to the comprehensive dataset description article for MultiGEC – a dataset for Multilingual Grammatical Error Correction including data for twelve European languages: Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian. It is well-known that in the field of Natural Language Processing (NLP) most publications tend to focus on the English language. While this is due to historical reasons (ease of publication, greater outreach, increased number of citations, etc.), it does leave other languages at a disadvantage across multiple tasks. The MultiGEC dataset was created as an attempt to counteract this effect. This report provides a historical overview of the evolution of GEC for each of the twelve languages in this dataset and provides a context for the work on the dataset and the related MultiGEC-2025 shared task.

Keywords

Grammatical Error Correction, Language Technology, Natural Language Processing, shared task, MultiGEC-2025, Computational SLA

URI

https://hdl.handle.net/2077/84800

Collections

GU-ISS Forskningsrapporter från Institutionen för svenska, flerspråkighet och språkteknologi (2011-)

Full item page

An overview of Grammatical Error Correction for the twelve MultiGEC-2025 languages

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections