Understanding machine learning software defect predictions

Geanderson Esteves; Eduardo Figueiredo; Adriano Veloso; Markos Viggiato; Nivio Ziviani

首页> 外文期刊>Automated software engineering >Understanding machine learning software defect predictions

【24h】

Understanding machine learning software defect predictions

机译：了解机器学习软件缺陷预测

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Software defects are well-known in software development and might cause several problems for users and developers aside. As a result, researches employed distinct techniques to mitigate the impacts of these defects in the source code. One of the most notable techniques focuses on defect prediction using machine learning methods, which could support developers in handling these defects before they are introduced in the production environment. These studies provide alternative approaches to predict the likelihood of defects. However, most of these works concentrate on predicting defects from a vast set of software features. Another key issue with the current literature is the lack of a satisfactory explanation of the reasons that drive the software to a defective state. Specifically, we use a tree boosting algorithm (XGBoost) that receives as input a training set comprising records of easy-to-compute characteristics of each module and outputs whether the corresponding module is defect-prone. To exploit the link between predictive power and model explainability, we propose a simple model sampling approach that finds accurate models with the minimum set of features. Our principal idea is that features not contributing to increasing the predictive power should not be included in the model. Interestingly, the reduced set of features helps to increase model explainability, which is important to provide information to developers on features related to each module of the code which is more defect-prone. We evaluate our models on diverse projects within Jureczko datasets, and we show that (ⅰ) features that contribute most for finding best models may vary depending on the project and (ⅱ) it is possible to find effective models that use few features leading to better understandability. We believe our results are useful to developers as we provide the specific software features that influence the defectiveness of selected projects.

机译：软件缺陷在软件开发中是众所周知的，可能对用户和开发人员造成几个问题。结果，研究采用了不同的技术来减轻这些缺陷在源代码中的影响。其中一个最值得注意的技术侧重于使用机器学习方法的缺陷预测，这可以支持开发人员在生产环境中引入之前处理这些缺陷。这些研究提供了预测缺陷可能性的替代方法。然而，大多数作品专注于预测来自一系列软件功能的缺陷。目前文献的另一个关键问题是对将软件驱动到有缺陷状态的原因缺乏令人满意的解释。具体地，我们使用树升压算法（XGBoost），该树升压算法（XGBoost）接收到输入的训练集，包括每个模块的易于计算特性的记录，并输出相应模块是否易于缺陷。为了利用预测电源和模型可解释性之间的链接，我们提出了一种简单的模型采样方法，可以使用最小的功能设置准确的模型。我们的主要思想是，不应包括增加预测电源的功能不应包含在模型中。有趣的是，减少的功能集有助于提高模型说明性，这对于向开发人员提供与与每个模块相关的功能的开发人员提供更缺陷的功能是很重要的。我们在Jureczko数据集中的不同项目中评估了我们的模型，我们展示了最多用于找到最佳模型的功能可能因项目而有所不同，并且（Ⅱ）可以找到使用少数功能的有效模型易懂。我们相信我们的结果对开发人员有用，因为我们提供影响所选项目缺陷的特定软件功能。

著录项

来源
《Automated software engineering》 |2020年第4期|369-392|共24页
作者
Geanderson Esteves; Eduardo Figueiredo; Adriano Veloso; Markos Viggiato; Nivio Ziviani;
展开▼
作者单位

Department of Computer Science Universidade Federal de Minas Gerais and Kunumi Belo Horizonte Brazil;

Department of Computer Science Universidade Federal de Minas Gerais Belo Horizonte Brazil;

Department of Computer Science Universidade Federal de Minas Gerais Belo Horizonte Brazil;

Department of Electrical and Computer Engineering University of Alberta Edmonton Canada;

Department of Computer Science Universidade Federal de Minas Gerais and Kunumi Belo Horizonte Brazil;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Software defects; Explainable models; Jureczko datasets; SHAP values;

机译：软件缺陷;可解释的模型;Jureczko数据集;形状值;

相似文献

外文文献
中文文献
专利

1. Lessons Learned from the Assessment of Software Defect Prediction on WLCG Software: A Study with Unlabelled Datasets and Machine Learning Techniques [J] . Elisabetta Ronchieri, Marco Canaparo, Mauro Belgiovine, EPJ Web of Conferences . 2020,第4期

机译：从WLCG软件的软件缺陷预测评估中汲取的经验教训：具有未标记数据集和机器学习技术的研究
2. Software defect prediction based on weighted extreme learning machine [J] . Gai Jinjing, Zheng Shang, Yu Hualong, Multiagent and grid systems . 2020,第1期

机译：基于加权极限学习机的软件缺陷预测
3. Software defect prediction based on kernel PCA and weighted extreme learning machine [J] . Zhou Xu, Jin Liu, Xiapu Luo, Information and software technology . 2019,第FEBa期

机译：基于内核PCA和加权极限学习机的软件缺陷预测
4. On the Defect Prediction for Large Scale Software Systems – From Defect Density to Machine Learning [C] . Satya Pradhan, Venky Nanniyur, Pavan K. Vissapragada International Conference on Software Quality, Reliability and Security . 2020

机译：大规模软件系统的缺陷预测 - 从缺陷密度到机器学习
5. The Effects of Parameter Tuning on Machine Learning Performance in a Software Defect Prediction Context [D] . Province, Benjamin N. 2015

机译：在软件缺陷预测环境中参数调整对机器学习性能的影响
6. Software Defect Prediction for Healthcare Big Data: An Empirical Evaluation of Machine Learning Techniques [O] . Bilal Khan, Rashid Naseem, Muhammad Arif Shah, 2021

机译：医疗保健大数据的软件缺陷预测：机器学习技术的实证评价
7. Software Defect Prediction for Healthcare Big Data: An Empirical Evaluation of Machine Learning Techniques [O] . Bilal Khan, Rashid Naseem, Muhammad Arif Shah, 2021

机译：医疗保健大数据的软件缺陷预测：机器学习技术的实证评价

Understanding machine learning software defect predictions

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅