【24h】

Misclassification Cost-Sensitive Software Defect Prediction

机译:错误分类的成本敏感软件缺陷预测

获取原文
获取原文并翻译 | 示例

摘要

Software defect prediction helps developers focus on defective modules for efficient software quality assurance. A common goal shared by existing software defect prediction methods is to attain low classification error rates. These proposals suffer from two practical problems: (i) Most of the prediction methods rely on a large number of labeled training data. However, collecting labeled data is a difficult and expensive task. It is hard to obtain classification labels over new software projects or existing projects without historical defect data. (ii) Software defect datasets are highly imbalanced. In many real-world applications, the misclassification cost of defective modules is generally several times higher than that of non-defective ones. In this paper, we present a misclassification Cost-sensitive approach to Software Defect Prediction (CSDP). The CSDP approach is novel in two aspects: First, CSDP addresses the problem of unlabeled software detect datasets by combining an unsupervised sampling method with a domain specific misclassification cost model. This preprocessing step selectively samples a small percentage of modules through estimating their classification labels. Second, CSDP builds a cost-sensitive support vector machine model to predict defect-proneness of the rest of modules with both overall classification error rate and domain specific misclassification cost as quality metrics. CSDP is evaluated on four NASA projects. Experimental results highlight three interesting observations: (1) CSDP achieves higher Normalized Expected Cost of Misclassification (NECM) compared with state-of-art supervised learning models under imbalanced training data with limited labeling. (2) CSDP outperforms state-of-art semi-supervised learning methods, which disregards classification costs, especially in recall rate. (3) CSDP enhanced through unsupervised sampling as a preprocessing step prior to training and prediction outperforms the baseline CSDP without the sampling process.
机译:软件缺陷预测可帮助开发人员专注于缺陷模块,以确保有效的软件质量。现有软件缺陷预测方法的共同目标是获得较低的分类错误率。这些建议存在两个实际问题:(i)大多数预测方法都依赖大量标记的训练数据。但是,收集标记的数据是困难且昂贵的任务。没有历史缺陷数据的新软件项目或现有项目很难获得分类标签。 (ii)软件缺陷数据集高度不平衡。在许多实际应用中,有缺陷的模块的误分类成本通常比无缺陷的模块高几倍。在本文中,我们提出了一种对软件缺陷预测(CSDP)的误分类,成本敏感的方法。 CSDP方法在两个方面都很新颖:首先,CSDP通过将无监督抽样方法与特定于领域的错误分类成本模型相结合,解决了未标记软件检测数据集的问题。该预处理步骤通过估计模块的分类标签有选择地对一小部分模块进行采样。其次,CSDP建立了一个成本敏感的支持向量机模型,以总体分类错误率和特定领域的误分类成本作为质量指标来预测其余模块的缺陷倾向。 CSDP在四个NASA项目上进行了评估。实验结果突出了三个有趣的观察结果:(1)与带有有限标签的不平衡训练数据的最新监督学习模型相比,CSDP实现了更高的归一化期望误分类成本(NECM)。 (2)CSDP优于最新的半监督学习方法,该方法忽略了分类成本,尤其是召回率。 (3)通过在训练和预测之前进行的无监督采样作为预处理步骤,CSDP的性能优于没有采样过程的基线CSDP。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号