Empowering Empirical Research in Software Design: Construction and Studies on a Large-Scale Corpus of UML Models
Abstract
Context: In modern software development, software modeling is considered to be an essential part of the software architecture and design activities. The Unified Modeling Language (UML) has become the de facto standard for software modeling in industry. Surprisingly, there are only a few empirical studies on the practices and impacts of UML modeling in software development. This is mainly due to the lack of empirical data on real-life software systems that use UML modeling.
Objective: This PhD thesis contributes to this matter by describing a method to build and curate a big corpus of open-source-software (OSS) projects that contain UML models. Subsequently, this thesis offers observations on the practices and impacts of using UML modeling in these OSS projects.
Method: We combine techniques from repository mining and image classification in order to successfully identify more than 24.000 open source projects on GitHub that together contain more than 93.000 UML models. Machine learning techniques are also used to enrich the corpus with annotations. Finally, various empirical studies, including a case study, a user study, a large-scale survey and an experiment, have been carried out across this set of projects.
Result: The results show that UML is generally perceived to be helpful to new contributors. The most important motivation for using UML seems to be to facilitate collaboration. In particular, teams use UML during communication and planning of joint implementation efforts. Our study also shows that the use of UML modeling has a positive impact on software quality, i.e. it correlates with lower defect proneness. Further, we find out that visualisation of design concepts, such as class role-stereotypes, helps developers to perform better in software comprehension tasks.
Parts of work
1. Truong Ho-Quang, Michel R.V. Chaudron, Ingimar Samúelsson, Jóel Hjaltason, Bilal Karasneh, and Hafeez Osman. "Automatic classification of UML class diagrams from images." Published in the 21st Asia-Pacific Software Engineering Conference (APSEC 2014), vol. 1, pp. 399-406. IEEE, 2014. ::doi::10.1109/APSEC.2014.65 2. Regina Hebig, Truong Ho-Quang, Michel R.V. Chaudron, Gregorio Robles, and Miguel Angel Fernandez. "The quest for open source projects that use UML: mining GitHub." Published in Proceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems, pp. 173-183. ACM, 2016. ::doi::10.1145/2976767.2976778 3. Truong Ho-Quang, Regina Hebig, Gregorio Robles, Michel R.V. Chaudron, and Miguel Angel Fernandez. "Practices and perceptions of UML use in open source projects." Published in 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP), pp. 203-212. IEEE, 2017. ::doi::10.1109/ICSE-SEIP.2017.28 4. Mohd Hafeez Osman, Truong Ho-Quang, and Michel R.V. Chaudron. "An automated approach for classifying reverse-engineered and forward-engineered UML class diagrams." In 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 396-399. IEEE, 2018. ::doi::10.1109/SEAA.2018.00070 5. Truong Ho-Quang, Michel R.V. Chaudron, Regina Hebig, Gregorio Robles. “Challenges and Directions for a Community Infrastructure for Big Data-driven Research in Software Architecture”. Accepted as a chapter in the book “Model Management and Analytics for Large Scale Systems”, To be published by Elsevier (Expected release date:October 4, 2019), Paperback ISBN: 9780128166499. 6. Adithya Raghuraman, Truong Ho-Quang, Michel R.V. Chaudron, Alexander Serebrenik, and Bogdan Vasilescu. "Does UML modeling associate with higher software quality in open-source software?." Accepted (In Press) at International Conference on Mining Software Repositories. 2019. 7. Truong Ho-Quang, Arif Nurwidyantoro, and Michel R.V. Chaudron. “Using MachineLearning for Automated Classification of Class Responsibility Stereotypesin Software Design”. Under Submission. 8. Truong Ho-Quang, Alexandre Bergel, Arif Nurwidyantoro, and Michel R.V. Chaudron “Interactive Role Stereotype-Based Visualization To Comprehend Software Architecture”. Under submission.
Degree
Doctor of Philosophy
University
Göteborgs universitet. IT-fakulteten
Institution
Department of Computer Science and Engineering ; Institutionen för data- och informationsteknik
Disputation
Wednesday, October 9th, 2019, 13.00, Dome of Visions, Campus Lindholmen, Lindholmsplatsen, Gothenburg
Date of defence
2019-10-09
truongh@chalmers.se
ho.quang.truong@gu.se
truonghoquang@gmail.com
Date
2019-09-18Author
Truong, Ho-Quang
Keywords
Software Modeling
Software Design
Empirical Research
UML
Modeling Practices
Impacts of Modeling
Open Source System
Mining Software Repository
Data Mining
Data Curation
GitHub
Publication type
Doctoral thesis
ISBN
978-91-7833-609-8
ISSN
0346-718X
Series/Report no.
173D
Language
eng