Automatic proficiency level prediction for Intelligent Computer-Assisted Language Learning
Abstract
With the ever-growing presence of electronic devices in our everyday lives, it is compelling to investigate how technology can contribute to make our language learning process more efficient and enjoyable. A fundamental piece in this puzzle is the ability to measure the complexity of the language that learners are able to deal with and produce at different stages of their progress.
In this thesis work, we explore automatic approaches for modeling linguistic complexity at different levels of learning Swedish as a second and foreign language (L2). For these purposes, we employ natural language processing techniques to extract linguistic features and combine them with machine learning methods. We study linguistic complexity in two types of L2 texts: those written by experts for learners and those produced by learners themselves. Moreover, we investigate this type of data-driven analysis for the smaller unit of sentences.
Automatic proficiency level prediction has a number of application potentials for the field of Intelligent Computer-Assisted Language Learning, out of which we investigate two directions. Firstly, this can facilitate locating learning materials suitable for L2 learners from corpora, which are valuable and easily accessible examples of authentic language use. We propose a framework for selecting sentences suitable as exercise items which, besides linguistic complexity, encompasses a number of additional criteria such as well-formedness and independence from a larger textual context. An empirical evaluation of the system implemented using these criteria indicated its usefulness in an L2 instructional setting. Secondly, linguistic complexity analysis enables the automatic evaluation of L2 texts which, besides being helpful for preparing learning materials, can also be employed for assessing learners' writing. We show that models trained partly or entirely on reading texts can effectively predict the proficiency level of learner essays, especially if some learner errors are automatically corrected in a pre-processing step. Both the sentence selection and the L2 text evaluation systems have been made freely available on an online learning platform.
Parts of work
Pilán, Ildikó, Sowmya Vajjala and Elena Volodina 2016. A readable read: automatic assessment of language learning materials based on linguistic complexity. International Journal of Computational Linguistics and Applications (IJLCA) 7 (1): 143–159. Pilán, Ildikó 2016. Detecting Context Dependence in Exercise Item Candidates Selected from Corpora. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications (BEA),
151–161. Pilán, Ildikó, Elena Volodina and Lars Borin 2017. Candidate sentence selection for language learning exercises: from a comprehensive framework to an empirical evaluation. Traitement Automatique des Langues (TAL) Journal, Special issue on NLP for learning and teaching 57 (3): 67–91. Pilán, Ildikó, Elena Volodina and Torsten Zesch 2016. Predicting proficiency levels in learner writings by transferring a linguistic complexity
model from expert-written coursebooks. Proceedings of the 26th International Conference on Computational Linguistics (COLING), 2101–2111. Pilán, Ildikó, David Alfter and Elena Volodina 2016. Coursebook texts as a helping hand for classifying linguistic complexity in language learners’ writings. Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), 120–126. Pilán, Ildikó and Elena Volodina. Investigating the importance of linguistic complexity features across different datasets related to language learning. Submitted.
Degree
Doctor of Philosophy
University
Göteborgs universitet. Humanistiska fakulteten
University of Gothenburg. Faculty of Arts
Institution
Department of Swedish ; Institutionen för svenska språket
Disputation
13.15, Stora hörsalen (2150), Eklandagatan 86
Date of defence
2018-06-14
ildiko.pilan@gmail.com
Date
2018-05-17Author
Pilán, Ildikó
Keywords
natural language processing
linguistic complexity
readability
CEFR
second language learning
corpus examples
text classification
machine learning
domain adaptation
Publication type
Doctoral thesis
ISBN
978-91-87850-68-4
ISSN
0347-948X
Series/Report no.
Data Linguistica
29
Language
eng