DISAMBIGUATING SEMANTIC ROLES IN SWEDISH COMPOUNDS with Swedish FrameNet and SALDO
DISAMBIGUATING SEMANTIC ROLES IN SWEDISH COMPOUNDS with Swedish FrameNet and SALDO
Sammanfattning
The compounding of words in Swedish is productive, recursive, and frequent in both text and speech. Compounds can be ambiguous on many levels, and the processing of them involves segmentation, lemma disambiguation, word sense disambiguation, and semantic analysis. In this thesis, we focus on the latter.
We concretise the semantic analysis as semantic role disambiguation, meaning the automatic analysis of the relationship between the two parts of a compound (prefix and suffix) given a set of semantic roles selected by the suffix. The system architecture revolves around lexical resources such as the Swedish
FrameNet (SweFN) and SALDO. In two experimental rounds, we train on (1) chunked and semantic role-analysed sentences, and (2) compounds marked up using the frames and semantic roles of SweFN. For instance, laxröra ‘salmon casserole’ is analysed as Constituent_parts+LU (LU=lexical unit) in the Food frame.
The training data of tagged sentences used in predicting compound semantic roles is deemed too sparse, and produces only a small improvement over a most-frequent-class baseline. In our final experiments,
we use a narrowed down set of frames and compounds as both train and test data. We reach a best classification accuracy of 62% against a 33% baseline on 100 unseen compounds.
Examinationsnivå
Student essay
Samlingar
Fil(er)
Datum
2016-10-17Författare
Hedberg, Karin
Nyckelord
compounds
disambiguation
semantic analysis
Swedish FrameNet
SALDO
Publikationstyp
H2
Språk
eng