Visa enkel post

dc.contributor.authorJohansson, Magnus
dc.contributor.authorKlemetz, Jonathan
dc.date.accessioned2016-06-27T11:47:51Z
dc.date.available2016-06-27T11:47:51Z
dc.date.issued2016-06-27
dc.identifier.urihttp://hdl.handle.net/2077/44662
dc.description.abstractAutomatically extracting topics from scientific papers can be very beneficial when a researcher needs to classify a large number of such papers. In this thesis we develop and evaluate an approach for Automatic Topic Extraction, Au- TopEx. The approach is comprised of four parts: 1) Text pre-processing. 2) Training a Latent Dirichlet Allocation model on part of a corpus. 3) Manually identifying relevant topics from the model. 4) Querying the model using the rest of the corpus. We show that it is possible to automatically extract topics by applying AuTopEx on a corpus of scientific papers on autonomous vehicles. According to our evaluation AuTopEx works better on full-text articles than texts consisting of just title, abstract and key-words. Finally we show that this approach is vastly faster than human annotators, although not as accurate.sv
dc.language.isoengsv
dc.titleAuTopEx: Automated Topic Extraction Techniques Applied in the Software Engineering Domainsv
dc.typetext
dc.setspec.uppsokTechnology
dc.type.uppsokM2
dc.contributor.departmentGöteborgs universitet/Institutionen för data- och informationsteknikswe
dc.contributor.departmentUniversity of Gothenburg/Department of Computer Science and Engineeringeng
dc.type.degreeStudent essay


Filer under denna titel

Thumbnail

Dokumentet tillhör följande samling(ar)

Visa enkel post