Machine Learning for Reducing the Effort of Conducting Systematic Reviews in SE

Su, Chuan

dc.contributor.author	Su, Chuan
dc.date.accessioned	2015-05-05T13:34:45Z
dc.date.available	2015-05-05T13:34:45Z
dc.date.issued	2015-05-05
dc.identifier.uri	http://hdl.handle.net/2077/38844
dc.description.abstract	Objective : To investigate whether machine learning and text-based data mining can be used to support the primary studies selection process and decrease the needed efforts in systematic reviews conducted in the context of SE. Research Design : A test collection was built from 3 systematic reviews used in previous work in the context of SE. The proposed probabilistic classifier based on Bayes’ Theorem was constructed to predict and classify each article as containing high-quality evidence to warrant inclusion in study selection process or not. Feature engineering techniques were applied to the abstract-based features. Cross-validation experiments were performed to evaluate the efficiency of the document classifier. Three metrics - precision, recall and specificity were used together to measure the classification performance. We assume that a recall rate of 0.9 or higher is required for the classifier to identify an sufficient quantity of relevant papers. As long as recall is at least 0.9, the Precision and Specificity should be as high as possible,. Results : From the hold-out cross validation experiment, the precision achieved with the classifier for two systematic review topics, was 93%, while 79% for another systematic review topic. The results of leave-one-out cross validation experiment were presented in three Confusion Matrix, which in detail indicated that the precision achieved with the classifier for the three systematic review topics was promising in terms of predicting relevant abstracts while relatively poor in terms of excluding irrelevant articles. Conclusion : The classifier based on Bayes’ Theorem has strong potential for performing the systematic review classification tasks in software engineering. The approach presented in this paper could be considered as a possible technique for assisting labor-intensive primary studies’ selection process in an SLR.	sv
dc.language.iso	eng	sv
dc.subject	Machine learning	sv
dc.subject	Systematic review	sv
dc.subject	Naive Bays classifier	sv
dc.subject	Text classification	sv
dc.subject	Software engineering	sv
dc.subject	Metrics	sv
dc.subject	Recall	sv
dc.title	Machine Learning for Reducing the Effort of Conducting Systematic Reviews in SE	sv
dc.type	text
dc.setspec.uppsok	Technology
dc.type.uppsok	M2
dc.contributor.department	Göteborgs universitet/Institutionen för data- och informationsteknik	swe
dc.contributor.department	University of Gothenburg/Department of Computer Science and Engineering	eng
dc.type.degree	Student essay

Files in this item

Name:: gupea_2077_38844_1.pdf
Size:: 2.002Mb
Format:: PDF
Description:: Bachelor Thesis

View/Open

This item appears in the following Collection(s)

Kandidatuppsatser

Show simple item record