Visa enkel post

dc.contributor.authorNieto Piña, Luis
dc.date.accessioned2019-08-23T08:03:45Z
dc.date.available2019-08-23T08:03:45Z
dc.date.issued2019-08-23
dc.identifier.isbn978-91-87850-75-2
dc.identifier.issn0347-948X
dc.identifier.urihttp://hdl.handle.net/2077/60509
dc.description.abstractThe representation of written language semantics is a central problem of language technology and a crucial component of many natural language processing applications, from part-of-speech tagging to text summarization. These representations of linguistic units, such as words or sentences, allow computer applications that work with language to process and manipulate the meaning of text. In particular, a family of models has been successfully developed based on automatically learning semantics from large collections of text and embedding them into a vector space, where semantic or lexical similarity is a function of geometric distance. Co-occurrence information of words in context is the main source of data used to learn these representations. Such models have typically been applied to learning representations for word forms, which have been widely applied, and proven to be highly successful, as characterizations of semantics at the word level. However, a word-level approach to meaning representation implies that the different meanings, or senses, of any polysemic word share one single representation. This might be problematic when individual word senses are of interest and explicit access to their specific representations is required. For instance, in cases such as an application that needs to deal with word senses rather than word forms, or when a digital lexicon's sense inventory has to be mapped to a set of learned semantic representations. In this thesis, we present a number of models that try to tackle this problem by automatically learning representations for word senses instead of for words. In particular, we try to achieve this by using two separate sources of information: corpora and lexica for the Swedish language. Throughout the five publications compiled in this thesis, we demonstrate that it is possible to generate word sense representations from these sources of data individually and in conjunction, and we observe that combining them yields superior results in terms of accuracy and sense inventory coverage. Furthermore, in our evaluation of the different representational models proposed here, we showcase the applicability of word sense representations both to downstream natural language processing applications and to the development of existing linguistic resources.sv
dc.language.isoengsv
dc.relation.ispartofseriesData Linguisticasv
dc.relation.ispartofseries30sv
dc.relation.haspartLuis Nieto Piña and Richard Johansson 2015. A simple and efficient method to generate word sense representations. Proceedings of the International Conference Recent Advances in Natural Language Processing, 465–472. Hissar, Bulgaria.sv
dc.relation.haspartLuis Nieto Piña and Richard Johansson 2016. Embedding senses for efficient graph-based word sense disambiguation. Proceedings of TextGraphs-10: the Workshop on Graph-based Methods for Natural Language Processing, NAACL-HLT 2016, 1–5. San Diego, USA.sv
dc.relation.haspartLuis Nieto Piña and Richard Johansson 2017. Training word sense embeddings with lexicon-based regularization. Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Asian Federation of Natural Language Processing. Taipei, Taiwan.sv
dc.relation.haspartLuis Nieto Piña and Richard Johansson 2018. Automatically Linking Lexical Resources with Word Sense Embedding Models. Proceedings of the third workshop on semantic deep learning (SemDeep-3), COLING 2018, 23–29. Association for Computational Linguistics. Santa Fe, USA.sv
dc.relation.haspartLars Borin, Luis Nieto Piña and Richard Johansson 2015. Here be dragons? The perils and promises of inter-resource lexical-semantic mapping. Proceedings of the workshop on semantic resources and semantic annotation for natural language processing and the digital humanities at NODALIDA 2015, 1–11. Vilnius, Lithuania.sv
dc.subjectlanguage technologysv
dc.subjectnatural language processingsv
dc.subjectdistributional modelssv
dc.subjectsemantic representationssv
dc.subjectdistributed representationssv
dc.subjectword sensessv
dc.subjectembeddingssv
dc.subjectword sense disambiguationsv
dc.subjectlinguistic resourcessv
dc.subjectcorpussv
dc.subjectlexiconsv
dc.subjectmachine learningsv
dc.subjectneural networkssv
dc.titleSplitting rocks: Learning word sense representations from corpora and lexicasv
dc.typeText
dc.type.svepDoctoral thesiseng
dc.type.degreeDoctor of Philosophysv
dc.gup.originGöteborgs universitet. Humanistiska fakultetenswe
dc.gup.originUniversity of Gothenburg. Faculty of Artseng
dc.gup.departmentDepartment of Swedish ; Institutionen för svenska språketsv
dc.gup.defenceplaceFredagen den 13 september 2019, kl. 13.15, Lilla hörsalen, Humanisten, Lundgrensgatan 1Bsv
dc.gup.defencedate2019-09-13
dc.gup.dissdb-fakultetHF


Filer under denna titel

Thumbnail
Thumbnail
Thumbnail

Dokumentet tillhör följande samling(ar)

Visa enkel post