首页> 外文期刊>Automated software engineering >Understanding machine learning software defect predictions
【24h】

Understanding machine learning software defect predictions

机译:了解机器学习软件缺陷预测

获取原文
获取原文并翻译 | 示例

摘要

Software defects are well-known in software development and might cause several problems for users and developers aside. As a result, researches employed distinct techniques to mitigate the impacts of these defects in the source code. One of the most notable techniques focuses on defect prediction using machine learning methods, which could support developers in handling these defects before they are introduced in the production environment. These studies provide alternative approaches to predict the likelihood of defects. However, most of these works concentrate on predicting defects from a vast set of software features. Another key issue with the current literature is the lack of a satisfactory explanation of the reasons that drive the software to a defective state. Specifically, we use a tree boosting algorithm (XGBoost) that receives as input a training set comprising records of easy-to-compute characteristics of each module and outputs whether the corresponding module is defect-prone. To exploit the link between predictive power and model explainability, we propose a simple model sampling approach that finds accurate models with the minimum set of features. Our principal idea is that features not contributing to increasing the predictive power should not be included in the model. Interestingly, the reduced set of features helps to increase model explainability, which is important to provide information to developers on features related to each module of the code which is more defect-prone. We evaluate our models on diverse projects within Jureczko datasets, and we show that (ⅰ) features that contribute most for finding best models may vary depending on the project and (ⅱ) it is possible to find effective models that use few features leading to better understandability. We believe our results are useful to developers as we provide the specific software features that influence the defectiveness of selected projects.
机译:软件缺陷在软件开发中是众所周知的,可能对用户和开发人员造成几个问题。结果,研究采用了不同的技术来减轻这些缺陷在源代码中的影响。其中一个最值得注意的技术侧重于使用机器学习方法的缺陷预测,这可以支持开发人员在生产环境中引入之前处理这些缺陷。这些研究提供了预测缺陷可能性的替代方法。然而,大多数作品专注于预测来自一系列软件功能的缺陷。目前文献的另一个关键问题是对将软件驱动到有缺陷状态的原因缺乏令人满意的解释。具体地,我们使用树升压算法(XGBoost),该树升压算法(XGBoost)接收到输入的训练集,包括每个模块的易于计算特性的记录,并输出相应模块是否易于缺陷。为了利用预测电源和模型可解释性之间的链接,我们提出了一种简单的模型采样方法,可以使用最小的功能设置准确的模型。我们的主要思想是,不应包括增加预测电源的功能不应包含在模型中。有趣的是,减少的功能集有助于提高模型说明性,这对于向开发人员提供与与每个模块相关的功能的开发人员提供更缺陷的功能是很重要的。我们在Jureczko数据集中的不同项目中评估了我们的模型,我们展示了最多用于找到最佳模型的功能可能因项目而有所不同,并且(Ⅱ)可以找到使用少数功能的有效模型易懂。我们相信我们的结果对开发人员有用,因为我们提供影响所选项目缺陷的特定软件功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号