• English
    • svenska
  • svenska 
    • English
    • svenska
  • Logga in
Redigera dokument 
  •   Startsida
  • Student essays / Studentuppsatser
  • Department of Philosophy,Lingustics and Theory of Science / Institutionen för filosofi, lingvistik och vetenskapsteori
  • Masteruppsatser / Master in Language Technology
  • Redigera dokument
  •   Startsida
  • Student essays / Studentuppsatser
  • Department of Philosophy,Lingustics and Theory of Science / Institutionen för filosofi, lingvistik och vetenskapsteori
  • Masteruppsatser / Master in Language Technology
  • Redigera dokument
JavaScript is disabled for your browser. Some features of this site may not work without it.

THE LINGUISTIC STRUCTURE OF WIKIPEDIA A multilingual analysis and comparison of the language used in Wikipedia articles

Sammanfattning
Wikipedia is a great source of knowledge, but due to its open-collaboration nature, it presents some limitations. Namely, the uneven distribution of content, the low overlap in topic coverage, the differences in the comprehensiveness of articles, and the low number of editors. For this reason, the Abstract Wikipedia project has been created; their objective is to construct language-independent (abstract) articles that can be rendered in any language. In this thesis, we have computationally analysed the language used in Wikipedia in order to find similarities between the language used in different articles. To do so, we have syntactically parsed articles of Wikipedia in different languages using UDPipe 2.0 and gathered the languages’ recurrent syntactic patterns using Grammatical Framework’s GF-UD. Then, we have compared the analyses with cosine similarity in two ways: based on dependency relations and based on linguistic patterns. We have seen that there is a basis for the Abstract Wikipedia project: there are syntactic similarities not only within one language, but also within multiple languages. In addition, we have found that semantically-related topics have a higher similarity than those which are not. Finally, we have gathered syntactic patterns of every language and compared them, which can constitute the basis of the creation of the Renderers for Abstract Wikipedia.
Examinationsnivå
Student essay
URL:
https://hdl.handle.net/2077/72155
Samlingar
  • Masteruppsatser / Master in Language Technology
Fil(er)
Master thesis (280.2Kb)
Datum
2022-06-20
Författare
Grau Francitorra, Patricia
Nyckelord
Abstract Wikipedia, Syntactic Analysis, Universal Dependencies, Grammatical Framework, UDPipe 2.0, Syntactic Patterns
Språk
eng
Metadata
Visa fullständig post

DSpace software copyright © 2002-2016  DuraSpace
gup@ub.gu.se | Teknisk hjälp
Theme by 
Atmire NV
 

 

Visa

VisaSamlingarI datumordningFörfattareTitlarNyckelordDenna samlingI datumordningFörfattareTitlarNyckelord

Mitt konto

Logga inRegistrera dig

DSpace software copyright © 2002-2016  DuraSpace
gup@ub.gu.se | Teknisk hjälp
Theme by 
Atmire NV