NLP methods for the automatic generation of exercises for second language learning from parallel corpus data
Abstract
Intelligent Computer Assisted Language Learning (ICALL), or Intelligent Computer Assisted Language Instruction (ICALI), is a field of research that combines Artificial Intelligence and Computer Assisted Language Learning (CALL) in order to produce tools that can aid second language learners without human intervention.
The automatic generation of exercises for language learners from a corpus enables the students to self-pace learning activities and offers a theoretically infinite, un-mediated and un-biased content.
In recent years, the advancement in NLP technology and the increase of available resources made this possibility closer. In particular, relevant sources of knowledge are the large collections of aligned parallel texts: corpora containing sentences in different languages, which can be considered translations of one another.
The present work explores the possibility to extract candidate sentences and their translations from a parallel corpus and use them to generate exercises for different proficiency levels.
The research was conducted experimenting with several available NLP tools and qualitatively evaluating the results on a training set of documents to define a pipeline for the language pairs: Swedish-English, English-Italian, Swedish-Italian. Finally, a set of 30 random documents was extracted and annotated manually to obtain a quantitative evaluation. The results showed a mean accuracy between 70-90% in the sentence selection, depending on the language pair; between 80-96% using more strict criteria for the selection and reducing the recall.
It is interesting to note that the implementation is mostly language independent, there is only one language-specific component to estimate the target proficiency level of the sentence, so in future works the same pipeline could be extended to include other language pairs.
Degree
Student essay
View/ Open
Date
2020-09-25Author
Zanetti, Arianna
Keywords
ICALL
language learning
parallel corpus
exercise generation
Language
eng