Improving the Performance of Machine Learning-based Methods for Continuous Integration by Handling Noise

Al-Sabbagh, Khaled

Improving the Performance of Machine Learning-based Methods for Continuous Integration by Handling Noise

dc.citation.doi	ITF
dc.contributor.author	Al-Sabbagh, Khaled
dc.date.accessioned	2023-08-22T06:09:28Z
dc.date.available	2023-08-22T06:09:28Z
dc.date.issued	2023-08-22
dc.description.abstract	Background: Modern software development companies are increasingly implementing continuous integration (CI) practices to meet market demands for delivering high-quality features. The availability of data from CI systems presents an opportunity for these companies to leverage machine learning to create methods for optimizing the CI process. Problem: The predictive performance of these methods can be hindered by inaccurate and irrelevant information – noise. Objective: The goal of this thesis is to improve the effectiveness of machine learning-based methods for CI by handling noise in data extracted from source code. Methods: This thesis employs design science research and controlled experiments to study the impact of noise-handling techniques in the context of CI. It involves developing ML-based methods for optimizing regression testing (MeBoTS and HiTTs), creating a taxonomy to reduce class noise, and implementing a class noise-handling technique (DB). Controlled experiments are carried out to examine the impact of class noise-handling on MeBoTS’ performance for CI. Results: The thesis findings show that handling class noise using the DB technique improves the performance of MeBoTS in test case selection and code change request predictions. The F1-score increases from 25% to 84% in test case selection and the Recall improved from 15% to 25% in code change request prediction after applying DB. However, handling attribute noise through a removal-based technique does not impact MeBoTS’ performance, as the F1-score remains at 66%. For memory management and complexity code changes should be tested with performance, load, soak, stress, volume, and capacity tests. Additionally, using the “majority filter” algorithm improves MCC from 0.13 to 0.58 in build outcome prediction and from -0.03 to 0.57 in code change request prediction. Conclusions: In conclusion, this thesis highlights the effectiveness of applying different class noise handling techniques to improve test case selection, build outcomes, and code change request predictions. Utilizing small code commits for training MeBoTS proves beneficial in filtering out test cases that do not reveal faults. Additionally, the taxonomy of dependencies offers an efficient and effective way for performing regression testing. Notably, handling attribute noise does not improve the predictions of test execution outcomes.	en
dc.gup.defencedate	2023-09-18
dc.gup.defenceplace	Lindholmen Science Park, Room Tesla, Monday September 18th 2023, kl. 13:00	en
dc.gup.department	Department of Computer Science and Engineering ; Institutionen för data- och informationsteknik	en
dc.gup.mail	khaled.al-sabbagh@gu.se	en
dc.gup.origin	University of Gothenburg, IT Faculty	en
dc.identifier.isbn	978-91-8069-362-2
dc.identifier.uri	https://hdl.handle.net/2077/77272
dc.language.iso	eng	en
dc.relation.haspart	Al Sabbagh, K., Staron, M., Hebig, R., & Meding, W. (2019). Predicting Test Case Verdicts Using TextualAnalysis of Commited Code Churns. In CEUR Workshop Proceedings (Vol. 2476, pp. 138-153).	en
dc.relation.haspart	Al-Sabbagh, K. W., Hebig, R., & Staron, M. (2020, November). The effect of class noise on continuous test case selection: A controlled experiment on industrial data. In International Conference on Product-Focused Software Process Improvement (pp. 287-303). Cham: Springer International Publishing.	en
dc.relation.haspart	Al-Sabbagh, K. W., Staron, M., & Hebig, R. (2022). Improving test case selection by handling class and attribute noise. Journal of Systems and Software, 183, 111093.	en
dc.relation.haspart	Al-Sabbagh, K., Staron, M., Hebig, R., & Gomes, F. (2021, August). A classification of code changes and test types dependencies for improving machine learning based test selection. In Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (pp. 40-49).	en
dc.relation.haspart	Al-Sabbagh, K. W., Staron, M., & Hebig, R. (2022, November). Improving Software Regression Testing Using a Machine Learning-Based Method for Test Type Selection. In International Conference on Product-Focused Software Process Improvement (pp. 480-496). Cham: Springer International Publishing.	en
dc.relation.haspart	Al-Sabbagh, K., Staron, M., & Hebig, R. (2022, November). Predicting build outcomes in continuous integration using textual analysis of source code commits. In Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering (pp. 42-51).	en
dc.relation.haspart	Al-Sabbagh, K., Staron, M., Habit, R. (2023, June). Submitted to ACM Transactions on Software Engineering and Methodology. The Impact of Class Noise-handling on the Effectiveness of Machine Learning-based Methods for Build Outcome and Code Change Request Predictions	en
dc.subject	Continuous Integration	en
dc.subject	Noise in software programs	en
dc.subject	Noise-handling	en
dc.subject	Software regression testing	en
dc.subject	Code change requests	en
dc.subject	Build prediction	en
dc.title	Improving the Performance of Machine Learning-based Methods for Continuous Integration by Handling Noise	en
dc.type	Text
dc.type.degree	Doctor of Philosophy	en
dc.type.svep	Doctoral thesis