Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction

Huang Qiao; Xia Xin; Lo David

首页> 外文期刊>Empirical Software Engineering >Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction

【24h】

Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction

机译：回顾有监督和无监督模型，以进行努力感知的及时缺陷预测

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Effort-aware just-in-time (JIT) defect prediction aims at finding more defective software changes with limited code inspection cost. Traditionally, supervised models have been used; however, they require sufficient labelled training data, which is difficult to obtain, especially for new projects. Recently, Yang et al. proposed an unsupervised model (i.e., LT) and applied it to projects with rich historical bug data. Interestingly, they reported that, under the same inspection cost (i.e., 20 percent of the total lines of code modified by all changes), it could find about 12% - 27% more defective changes than a state-of-the-art supervised model (i.e., EALR) when using different evaluation settings. This is surprising as supervised models that benefit from historical data are expected to perform better than unsupervised ones. Their finding suggests that previous studies on defect prediction had made a simple problem too complex. Considering the potential high impact of Yang et al.'s work, in this paper, we perform a replication study and present the following new findings: (1) Under the same inspection budget, LT requires developers to inspect a large number of changes necessitating many more context switches. (2) Although LT finds more defective changes, many highly ranked changes are false alarms. These initial false alarms may negatively impact practitioners' patience and confidence. (3) LT does not outperform EALR when the harmonic mean of Recall and Precision (i.e., F1-score) is considered. Aside from highlighting the above findings, we propose a simple but improved supervised model called CBS+, which leverages the idea of both EALR and LT. We investigate the performance of CBS+ using three different evaluation settings, including time-wise cross-validation, 10-times 10-fold cross-validation and cross-project validation. When compared with EALR, CBS+ detects about 15% - 26% more defective changes, while keeping the number of context switches and initial false alarms close to those of EALR. When compared with LT, the number of defective changes detected by CBS+ is comparable to LT's result, while CBS+ significantly reduces context switches and initial false alarms before first success. Finally, we discuss how to balance the tradeoff between the number of inspected defects and context switches, and present the implications of our findings for practitioners and researchers.

机译：尽力而为的即时（JIT）缺陷预测旨在以有限的代码检查成本找到更多的缺陷软件更改。传统上，使用监督模型。但是，它们需要足够的带标签的培训数据，而这些数据很难获得，尤其是对于新项目。最近，Yang等。提出了一种无监督的模型（即LT），并将其应用于具有丰富历史错误数据的项目。有趣的是，他们报告说，在相同的检查成本下（即，所有更改修改的代码总行数的20％），与最新的监督监督相比，它发现的缺陷更改大约多12％-27％使用不同评估设置时的模型（即EALR）。令人惊讶的是，受益于历史数据的监督模型的性能要优于无监督模型。他们的发现表明，先前对缺陷预测的研究使一个简单的问题变得过于复杂。考虑到Yang等人的工作可能带来的巨大影响，在本文中，我们进行了一项复制研究，并提出了以下新发现：（1）在相同的检查预算下，LT要求开发人员检查大量必要的变更还有更多上下文切换。（2）尽管LT发现了更多有缺陷的更改，但许多排名较高的更改都是错误警报。这些最初的错误警报可能会对从业者的耐心和信心产生负面影响。（3）当考虑召回率和精确度的谐波均值（即F1分数）时，LT的性能不会优于EALR。除了强调上述发现之外，我们还提出了一个简单但经过改进的监督模型CBS +，该模型利用了EALR和LT的思想。我们使用三种不同的评估设置来研究CBS +的性能，包括按时间交叉验证，10倍10倍交叉验证和交叉项目验证。与EALR相比，CBS +可以检测到大约15％-26％的缺陷更改，同时保持上下文切换和初始错误警报的数量接近EALR。与LT相比，CBS +检测到的缺陷更改数量与LT的结果相当，而CBS +则大大减少了上下文切换和首次成功之前的初始错误警报。最后，我们讨论如何在检查的缺陷数量和上下文切换之间权衡取舍，并提出我们的发现对从业人员和研究人员的意义。

著录项

来源
《Empirical Software Engineering》 |2019年第5期|2823-2862|共40页
作者
Huang Qiao; Xia Xin; Lo David;
展开▼
作者单位

Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China;

Monash Univ, Fac Informat Technol, Melbourne, Vic, Australia;

Singapore Management Univ, Sch Informat Syst, Singapore, Singapore;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Defect prediction; Evaluation metrics; Research bias;

机译：缺陷预测;评估指标;研究偏见;

相似文献

外文文献
中文文献
专利

1. Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction [J] . Huang Qiao, Xia Xin, Lo David Empirical Software Engineering . 2019,第5期

机译：重新审视监督和无监督模型的努力知识的缺陷预测
2. Effort-Aware semi-Supervised just-in-Time defect prediction [J] . Li Weiwei, Zhang Wenzhou, Jia Xiuyi, Information and software technology . 2020,第Octa期

机译：努力感知半监督的刚反时间缺陷预测
3. DEJIT: A Differential Evolution Algorithm for Effort-Aware Just-in-Time Software Defect Prediction [J] . Xingguang Yang, Huiqun Yu, Guisheng Fan, International journal of software engineering and knowledge engineering . 2021,第3期

机译：Dejit：一种差分演化算法，用于努力感知的仅限时间软件缺陷预测
4. Supervised vs Unsupervised Models: A Holistic Look at Effort-Aware Just-in-Time Defect Prediction [C] . Qiao Huang, Xin Xia, David Lo IEEE International Conference on Software Maintenance and Evolution . 2017

机译：监督模型与无监督模型：全面了解努力意识的及时缺陷预测
5. Metric Learning Revisited New Approaches for Supervised and Unsupervised Metric Learning with Analysis and Algorithms. [D] . Abou-Moustafa, Karim Tamer. 2012

机译：度量学习通过分析和算法重新探讨了有监督和无监督度量学习的新方法。
6. Effort-aware and just-in-time defect prediction with neural network [O] . Lei Qiao, Yan Wang -1

机译：利用神经网络进行工作量感知和及时的缺陷预测
7. Effort-aware just-in-time defect prediction : simple unsupervised models could be better than supervised models [O] . Yang YB, Zhou YM, Liu JP, 2016

机译：尽力而为的及时缺陷预测：简单的无监督模型可能比有监督模型更好

Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction

摘要

著录项

相似文献

相关主题

期刊订阅