TOPIC MODELING FOR ANALYSIS OF PUBLIC DISCOURSE -Enriching topic modeling with linguistic information to analyze Swedish housing policies

Lindahl, Anna

TOPIC MODELING FOR ANALYSIS OF PUBLIC DISCOURSE -Enriching topic modeling with linguistic information to analyze Swedish housing policies

Sammanfattning

This work investigates how the method of topic modeling can be applied to investigate the public discourse of Swedish housing policies. The data used to represent this discourse is both from the Swedish parliament, the Riksdag, and Swedish newstexts. The lack of housing and current housing crisis in Sweden makes this a relevant area to study. Topic modeling is an unsupervised probabilistic method for finding topics in large collections of data. This is a popular method for examining public discourse, however there is a lack of including linguistic information in the preprocessing steps of it. Therefore, this work also investigates what effect linguistically informed preprocessing has on topic modeling. Three types of linguistic information are selected and investigated. These are part of speech, dependency relations and lemmatization. Based on these, filters are created for the data. The filters are applied to a test set (a subset of the original data), and a topic model is trained on each filtered version of this test set. The resulting topics from each model are evaluated by both humans and the computational methods perplexity and semantic coherence, and the results from the respective evaluation methods are compared. The semantic coherence named cv is found to have a higher correlation with human ratings than the npmi coherence. Perplexity is found to not correlate well with human ratings. Filtering the data based on part of speech is found to most improve the topic quality. Non-lemmatized topics are found to be rated higher than lemmatized topics. Topics from the filters based on dependency relations are found to have low ratings. Based on the human ratings, an optimum model for respective data set is chosen. The selected topic models are applied to the data, and the results are used for to exemplify how one can use them for analysis. Topic modeling is found to be a suitable method for the intended analysis.

Examinationsnivå

Student essay

Datum

2018-01-15

Författare

Lindahl, Anna

Nyckelord

topic modeling

public discourse

housing policies

LDA

semantic coherence measures

part of speech

Publikationstyp

Språk

eng

Metadata

Visa fullständig post