Predicting software vulnerabilities using topic modeling

dc.contributor.author	Sileikis, Saimonas
dc.date.accessioned	2016-06-27T12:01:10Z
dc.date.available	2016-06-27T12:01:10Z
dc.date.issued	2016-06-27
dc.identifier.uri	http://hdl.handle.net/2077/44667
dc.description.abstract	A vulnerability database for a large C++ program was used to mark source code files responsible for the vulnerability either as clean or vulnerable. The whole source code was used with latent Dirchlet allocation (LDA) to extract hidden topics from it. Each file was given a topic distribution probability, as well as the status of being either clean or vulnerable. This data was used to train machine learning algorithm to detect vulnerable source files, based only on their topic distribution. In total, three different sets of data were prepared from the original source code with varying number of topics, number of documents, and iterations of LDA performed. None of data sets showed ability to predict software vulnerability based on LDA and machine learning.	sv
dc.language.iso	eng	sv
dc.title	Predicting software vulnerabilities using topic modeling	sv
dc.type	text
dc.setspec.uppsok	Technology
dc.type.uppsok	M2
dc.contributor.department	Göteborgs universitet/Institutionen för data- och informationsteknik	swe
dc.contributor.department	University of Gothenburg/Department of Computer Science and Engineering	eng
dc.type.degree	Student essay