Automatic proficiency level prediction for Intelligent Computer-Assisted Language Learning

Pilán, Ildikó

Abstract

With the ever-growing presence of electronic devices in our everyday lives, it is compelling to investigate how technology can contribute to make our language learning process more efficient and enjoyable. A fundamental piece in this puzzle is the ability to measure the complexity of the language that learners are able to deal with and produce at different stages of their progress. In this thesis work, we explore automatic approaches for modeling linguistic complexity at different levels of learning Swedish as a second and foreign language (L2). For these purposes, we employ natural language processing techniques to extract linguistic features and combine them with machine learning methods. We study linguistic complexity in two types of L2 texts: those written by experts for learners and those produced by learners themselves. Moreover, we investigate this type of data-driven analysis for the smaller unit of sentences. Automatic proficiency level prediction has a number of application potentials for the field of Intelligent Computer-Assisted Language Learning, out of which we investigate two directions. Firstly, this can facilitate locating learning materials suitable for L2 learners from corpora, which are valuable and easily accessible examples of authentic language use. We propose a framework for selecting sentences suitable as exercise items which, besides linguistic complexity, encompasses a number of additional criteria such as well-formedness and independence from a larger textual context. An empirical evaluation of the system implemented using these criteria indicated its usefulness in an L2 instructional setting. Secondly, linguistic complexity analysis enables the automatic evaluation of L2 texts which, besides being helpful for preparing learning materials, can also be employed for assessing learners' writing. We show that models trained partly or entirely on reading texts can effectively predict the proficiency level of learner essays, especially if some learner errors are automatically corrected in a pre-processing step. Both the sentence selection and the L2 text evaluation systems have been made freely available on an online learning platform.

Parts of work

Pilán, Ildikó, Sowmya Vajjala and Elena Volodina 2016. A readable read: automatic assessment of language learning materials based on linguistic complexity. International Journal of Computational Linguistics and Applications (IJLCA) 7 (1): 143–159.

Pilán, Ildikó 2016. Detecting Context Dependence in Exercise Item Candidates Selected from Corpora. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications (BEA), 151–161.

Pilán, Ildikó, Elena Volodina and Lars Borin 2017. Candidate sentence selection for language learning exercises: from a comprehensive framework to an empirical evaluation. Traitement Automatique des Langues (TAL) Journal, Special issue on NLP for learning and teaching 57 (3): 67–91.

Pilán, Ildikó, Elena Volodina and Torsten Zesch 2016. Predicting proficiency levels in learner writings by transferring a linguistic complexity model from expert-written coursebooks. Proceedings of the 26th International Conference on Computational Linguistics (COLING), 2101–2111.

Pilán, Ildikó, David Alfter and Elena Volodina 2016. Coursebook texts as a helping hand for classifying linguistic complexity in language learners’ writings. Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), 120–126.

Pilán, Ildikó and Elena Volodina. Investigating the importance of linguistic complexity features across different datasets related to language learning. Submitted.

Degree

Doctor of Philosophy

University

Göteborgs universitet. Humanistiska fakulteten

University of Gothenburg. Faculty of Arts

Institution

Department of Swedish ; Institutionen för svenska språket

Disputation

13.15, Stora hörsalen (2150), Eklandagatan 86

Date of defence

2018-06-14

E-mail

ildiko.pilan@gmail.com

URI

http://hdl.handle.net/2077/55895

Collections

View/Open

Abstract (50.70Kb)

Thesis (2.918Mb)

Cover (904.0Kb)

Date

2018-05-17

Author

Pilán, Ildikó

Keywords

natural language processing

linguistic complexity

readability

CEFR

second language learning

corpus examples

text classification

machine learning

domain adaptation

Publication type

Doctoral thesis

ISBN

978-91-87850-68-4

ISSN

0347-948X

Series/Report no.

Data Linguistica

Language

eng

Metadata

Show full item record