AuTopEx: Automated Topic Extraction Techniques Applied in the Software Engineering Domain
Abstract
Automatically extracting topics from scientific
papers can be very beneficial when a researcher
needs to classify a large number of
such papers.
In this thesis we develop and evaluate an
approach for Automatic Topic Extraction, Au-
TopEx. The approach is comprised of four
parts:
1) Text pre-processing.
2) Training a Latent Dirichlet Allocation model
on part of a corpus.
3) Manually identifying relevant topics from
the model.
4) Querying the model using the rest of the corpus.
We show that it is possible to automatically
extract topics by applying AuTopEx on a corpus
of scientific papers on autonomous vehicles.
According to our evaluation AuTopEx works
better on full-text articles than texts consisting
of just title, abstract and key-words.
Finally we show that this approach is vastly
faster than human annotators, although not as
accurate.
Degree
Student essay
Collections
View/ Open
Date
2016-06-27Author
Johansson, Magnus
Klemetz, Jonathan
Language
eng