A list of productive vocabulary generated from second language learners' essays
Abstract
Corpora for second language (L2) learning may contain a receptive vocabulary, i.e., vocabulary that is understandable by learners or productive vocabulary that L2 learners themselves are able to actively
use. Corpora containing productive vocabulary could assist both students and teachers, e.g. tracking the actual learning progress, as well as language technologists who wish to analyse L2 learners' language.
While there exist productive vocabulary lists in other languages, such as the English Vocabulary Profile list, none have been made for Swedish. In this paper, we describe our project to create a Swedish
vocabulary list generated from a learners' corpus, which consists of a number of second language (L2) learner essays collected into an electronic corpus. The list, named SweLL-list, contains normalised
lemma and part-of-speech tag combinations and their frequency counts.
We present the work that was done to create a part of this learner corpus and the list based on it. Furthermore, we detail a normalisation algorithm, based on Levenshtein distance, used to correct L2 word level errors. We then proceed to describe our list in detail and analyse this resource through a comparison to SVALex, a vocabulary list based on L2 reading comprehension materials. Finally we examine the results of the aforementioned normalisation algorithm.
From examining the SweLL-list and comparing it to SVALex, we got some indications on the L2 students' progress. For example, we saw that while a great part of the vocabulary is taught at the intermediate
levels, the students' productive vocabulary does not increase accordingly until the proficient levels. Our analysis of the performance of Levenshtein distance for correcting L2 word level errors showed
promise, especially for longer words (more than 4 characters) and where only one spelling error had been made. In order to improve the normalisation for multiple errors and shorter words, more work is
needed, possibly combining the Levenshtein distance with other language technology tools.
Degree
Student essay
View/ Open
Date
2016-09-15Author
Llozhi, Lorena
Keywords
SweLL list
Corpora
second language (L2)
SVALex
Swedish Kelly-list
NLP
Language
eng