Dealing with word ambiguity in NLP. Building appropriate sense representations for Danish sense tagging by combining word embeddings with wordnet senses

Rørmann Olsen, Ida

dc.contributor.author	Rørmann Olsen, Ida
dc.date.accessioned	2018-12-13T11:15:40Z
dc.date.available	2018-12-13T11:15:40Z
dc.date.issued	2018-12-13
dc.identifier.uri	http://hdl.handle.net/2077/58385
dc.description.abstract	This thesis describes an approach to handle word sense in natural language processing. If we want language technologies to handle word ambiguity, then machines need proper sense representations. In a case study on Danish ambiguous nouns, we examined the possibility of building an appropriate sense inventory by combining the distributional information of a word from a vector space model with knowledge-based information from a wordnet. We tested three sense representations in a word sense disambiguation task: firstly, the centroids (average of words) of selected wordnet synset information and members, secondly the centroids of wordnet sample sentence alone, and thirdly the centroids of un-labelled sample sentences clustered around the wordnet sample sentence. Finally, we tested the features of the cluster members and evaluation data in supervised machine learning classifiers. The sense representations in all experiments generally beat the random baseline significantly, but not the most frequent sense as default. The representations made from selected wordnet synset information and synset members proved to generally give the best result, especially for those target words with rich synset information. The machine learning classifiers outperformed the sense representations significantly on the word sense disambiguation task. The best classifiers were those trained and tested on either the clustered data or the evaluation data. We conclude that the combination of word embeddings and wordnet associated data used to build a proper sense representation seems promising. However, we suggest some improvements for future work, specifically on the extracted information from wordnet sample sentences.	sv
dc.language.iso	eng	sv
dc.subject	sense embeddings	sv
dc.subject	wordnet	sv
dc.subject	word2vec	sv
dc.subject	word sense disambiguation	sv
dc.subject	clustering	sv
dc.subject	machine learning	sv
dc.subject	supervised WSD	sv
dc.title	Dealing with word ambiguity in NLP. Building appropriate sense representations for Danish sense tagging by combining word embeddings with wordnet senses	sv
dc.type	Text
dc.setspec.uppsok	HumanitiesTheology
dc.type.svep	H2
dc.contributor.department	Göteborgs universitet/Institutionen för filosofi, lingvistik och vetenskapsteori	swe
dc.contributor.department	Göteborg University/Department of Philosophy, Linguistics and Theory of Science	eng
dc.type.degree	Student essay

Filer under denna titel

Namn:: gupea_2077_58385_1.pdf
Storlek:: 1.916Mb
Format:: PDF
Description:: Masteruppsats språkteknologi

Fil(er)

Dokumentet tillhör följande samling(ar)

Master

Visa enkel post