APPLYING MACHINE LEARNING ALGORITHMS TO DETECT LINES OF CODE CONTRIBUTING TO TECHNICAL DEBT
Abstract
This paper shows the investigation of the viability of finding lines of
code (LOC) contributing to technical debt (TD) using machine learning (ML), by
trying to imitate the static code analysis tool SonarQube. This is approached by
letting industry professionals choose the SonarQube rules, followed by training
different classifiers with the help of CCFlex (a tool for training classifiers with
lines of code), while using SonarQube as an oracle (a source of training sample
data) which selects the faulty lines of code. The codebase consisted of a couple
of proprietary software solutions, provided by Diadrom (a Swedish software consultancy
company), along with open source software, such as ColourSharp [9].
The different classifiers were then analyzed for accuracy – compared against the
oracle (SonarQube). The results of this paper demonstrate that using machine
learning algorithms to detect LOC contributing to technical debt is a promising
path that should be researched further. Within our chosen training parameters,
the results show that increasing the percentage of LOC marked by the oracle
brought increasingly better recall [7] values. The values increased more consistently
than they did by just increasing the amount of LOC used for training. Furthermore,
even though the precision is generally low within our parameters
(meaning that the number of false positives is high), our classifiers still predicted
many of the actually faulty LOC. These results are very promising when all of
the training parameters are kept in mind. They show a lot of promise and open
the gates to further exploration of this topic in the future.
Degree
Student essay
Collections
Date
2019-11-12Author
Isakovski, Filip
Sauleo, Rafael Antonino
Keywords
Technical Debt
Machine Learning
Static Code Analysis
Language
eng