• English
    • svenska
  • svenska 
    • English
    • svenska
  • Logga in
Redigera dokument 
  •   Startsida
  • Student essays / Studentuppsatser
  • Department of Philosophy,Lingustics and Theory of Science / Institutionen för filosofi, lingvistik och vetenskapsteori
  • Master
  • Redigera dokument
  •   Startsida
  • Student essays / Studentuppsatser
  • Department of Philosophy,Lingustics and Theory of Science / Institutionen för filosofi, lingvistik och vetenskapsteori
  • Master
  • Redigera dokument
JavaScript is disabled for your browser. Some features of this site may not work without it.

Dealing with word ambiguity in NLP. Building appropriate sense representations for Danish sense tagging by combining word embeddings with wordnet senses

Sammanfattning
This thesis describes an approach to handle word sense in natural language processing. If we want language technologies to handle word ambiguity, then machines need proper sense representations. In a case study on Danish ambiguous nouns, we examined the possibility of building an appropriate sense inventory by combining the distributional information of a word from a vector space model with knowledge-based information from a wordnet. We tested three sense representations in a word sense disambiguation task: firstly, the centroids (average of words) of selected wordnet synset information and members, secondly the centroids of wordnet sample sentence alone, and thirdly the centroids of un-labelled sample sentences clustered around the wordnet sample sentence. Finally, we tested the features of the cluster members and evaluation data in supervised machine learning classifiers. The sense representations in all experiments generally beat the random baseline significantly, but not the most frequent sense as default. The representations made from selected wordnet synset information and synset members proved to generally give the best result, especially for those target words with rich synset information. The machine learning classifiers outperformed the sense representations significantly on the word sense disambiguation task. The best classifiers were those trained and tested on either the clustered data or the evaluation data. We conclude that the combination of word embeddings and wordnet associated data used to build a proper sense representation seems promising. However, we suggest some improvements for future work, specifically on the extracted information from wordnet sample sentences.
Examinationsnivå
Student essay
URL:
http://hdl.handle.net/2077/58385
Samlingar
  • Master
Fil(er)
Masteruppsats språkteknologi (1.916Mb)
Datum
2018-12-13
Författare
Rørmann Olsen, Ida
Nyckelord
sense embeddings
wordnet
word2vec
word sense disambiguation
clustering
machine learning
supervised WSD
Publikationstyp
H2
Språk
eng
Metadata
Visa fullständig post

DSpace software copyright © 2002-2016  DuraSpace
gup@ub.gu.se | Teknisk hjälp
Theme by 
Atmire NV
 

 

Visa

VisaSamlingarI datumordningFörfattareTitlarNyckelordDenna samlingI datumordningFörfattareTitlarNyckelord

Mitt konto

Logga inRegistrera dig

DSpace software copyright © 2002-2016  DuraSpace
gup@ub.gu.se | Teknisk hjälp
Theme by 
Atmire NV