NEURAL MACHINE TRANSLATION FROM NORTH SÁMI TO SWEDISH
Abstract
Neural machine translation is a method used in automatic translation that makes use of artificial neural
networks. A single model takes an input sequence and predicts the most likely output sequence of words
after being trained on parallel data.
In this master thesis, a neural machine translation model for the language pair North Sámi - Swedish was
developed and trained. Since no parallel corpus exists between the two languages, a data set of Norwegian
and North Sámi of about 225.000 sentences was translated to Swedish and used as training data. The
model architecture is based on Vaswani et al. (2017)’s transformer, which is the state-of-the-art approach,
if enough parallel data is available. Following Sennrich et al. (2016)’s techniques of combining methods
to lower the amount of necessary data, a BLEU score of 44.11 was achieved. Due to the relatively small
amount of available parallel data, techniques of incorporating monolingual bitext and creating synthetic
additional data were implemented, but did not result in any further improvements.
Degree
Student essay
Collections
View/ Open
Date
2022-03-08Author
Pfau, Merle
Keywords
Neural Machine Translation, low-resource language, North Sámi - Swedish
Language
eng