首页> 外文期刊>Empirical Software Engineering >Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction
【24h】

Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction

机译:回顾有监督和无监督模型,以进行努力感知的及时缺陷预测

获取原文
获取原文并翻译 | 示例
           

摘要

Effort-aware just-in-time (JIT) defect prediction aims at finding more defective software changes with limited code inspection cost. Traditionally, supervised models have been used; however, they require sufficient labelled training data, which is difficult to obtain, especially for new projects. Recently, Yang et al. proposed an unsupervised model (i.e., LT) and applied it to projects with rich historical bug data. Interestingly, they reported that, under the same inspection cost (i.e., 20 percent of the total lines of code modified by all changes), it could find about 12% - 27% more defective changes than a state-of-the-art supervised model (i.e., EALR) when using different evaluation settings. This is surprising as supervised models that benefit from historical data are expected to perform better than unsupervised ones. Their finding suggests that previous studies on defect prediction had made a simple problem too complex. Considering the potential high impact of Yang et al.'s work, in this paper, we perform a replication study and present the following new findings: (1) Under the same inspection budget, LT requires developers to inspect a large number of changes necessitating many more context switches. (2) Although LT finds more defective changes, many highly ranked changes are false alarms. These initial false alarms may negatively impact practitioners' patience and confidence. (3) LT does not outperform EALR when the harmonic mean of Recall and Precision (i.e., F1-score) is considered. Aside from highlighting the above findings, we propose a simple but improved supervised model called CBS+, which leverages the idea of both EALR and LT. We investigate the performance of CBS+ using three different evaluation settings, including time-wise cross-validation, 10-times 10-fold cross-validation and cross-project validation. When compared with EALR, CBS+ detects about 15% - 26% more defective changes, while keeping the number of context switches and initial false alarms close to those of EALR. When compared with LT, the number of defective changes detected by CBS+ is comparable to LT's result, while CBS+ significantly reduces context switches and initial false alarms before first success. Finally, we discuss how to balance the tradeoff between the number of inspected defects and context switches, and present the implications of our findings for practitioners and researchers.
机译:尽力而为的即时(JIT)缺陷预测旨在以有限的代码检查成本找到更多的缺陷软件更改。传统上,使用监督模型。但是,它们需要足够的带标签的培训数据,而这些数据很难获得,尤其是对于新项目。最近,Yang等。提出了一种无监督的模型(即LT),并将其应用于具有丰富历史错误数据的项目。有趣的是,他们报告说,在相同的检查成本下(即,所有更改修改的代码总行数的20%),与最新的监督监督相比,它发现的缺陷更改大约多12%-27%使用不同评估设置时的模型(即EALR)。令人惊讶的是,受益于历史数据的监督模型的性能要优于无监督模型。他们的发现表明,先前对缺陷预测的研究使一个简单的问题变得过于复杂。考虑到Yang等人的工作可能带来的巨大影响,在本文中,我们进行了一项复制研究,并提出了以下新发现:(1)在相同的检查预算下,LT要求开发人员检查大量必要的变更还有更多上下文切换。 (2)尽管LT发现了更多有缺陷的更改,但许多排名较高的更改都是错误警报。这些最初的错误警报可能会对从业者的耐心和信心产生负面影响。 (3)当考虑召回率和精确度的谐波均值(即F1分数)时,LT的性能不会优于EALR。除了强调上述发现之外,我们还提出了一个简单但经过改进的监督模型CBS +,该模型利用了EALR和LT的思想。我们使用三种不同的评估设置来研究CBS +的性能,包括按时间交叉验证,10倍10倍交叉验证和交叉项目验证。与EALR相比,CBS +可以检测到大约15%-26%的缺陷更改,同时保持上下文切换和初始错误警报的数量接近EALR。与LT相比,CBS +检测到的缺陷更改数量与LT的结果相当,而CBS +则大大减少了上下文切换和首次成功之前的初始错误警报。最后,我们讨论如何在检查的缺陷数量和上下文切换之间权衡取舍,并提出我们的发现对从业人员和研究人员的意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号