Misclassification Cost-Sensitive Software Defect Prediction

机译：错误分类的成本敏感软件缺陷预测

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Software defect prediction helps developers focus on defective modules for efficient software quality assurance. A common goal shared by existing software defect prediction methods is to attain low classification error rates. These proposals suffer from two practical problems: (i) Most of the prediction methods rely on a large number of labeled training data. However, collecting labeled data is a difficult and expensive task. It is hard to obtain classification labels over new software projects or existing projects without historical defect data. (ii) Software defect datasets are highly imbalanced. In many real-world applications, the misclassification cost of defective modules is generally several times higher than that of non-defective ones. In this paper, we present a misclassification Cost-sensitive approach to Software Defect Prediction (CSDP). The CSDP approach is novel in two aspects: First, CSDP addresses the problem of unlabeled software detect datasets by combining an unsupervised sampling method with a domain specific misclassification cost model. This preprocessing step selectively samples a small percentage of modules through estimating their classification labels. Second, CSDP builds a cost-sensitive support vector machine model to predict defect-proneness of the rest of modules with both overall classification error rate and domain specific misclassification cost as quality metrics. CSDP is evaluated on four NASA projects. Experimental results highlight three interesting observations: (1) CSDP achieves higher Normalized Expected Cost of Misclassification (NECM) compared with state-of-art supervised learning models under imbalanced training data with limited labeling. (2) CSDP outperforms state-of-art semi-supervised learning methods, which disregards classification costs, especially in recall rate. (3) CSDP enhanced through unsupervised sampling as a preprocessing step prior to training and prediction outperforms the baseline CSDP without the sampling process.

机译：软件缺陷预测可帮助开发人员专注于缺陷模块，以确保有效的软件质量。现有软件缺陷预测方法的共同目标是获得较低的分类错误率。这些建议存在两个实际问题：（i）大多数预测方法都依赖大量标记的训练数据。但是，收集标记的数据是困难且昂贵的任务。没有历史缺陷数据的新软件项目或现有项目很难获得分类标签。（ii）软件缺陷数据集高度不平衡。在许多实际应用中，有缺陷的模块的误分类成本通常比无缺陷的模块高几倍。在本文中，我们提出了一种对软件缺陷预测（CSDP）的误分类，成本敏感的方法。 CSDP方法在两个方面都很新颖：首先，CSDP通过将无监督抽样方法与特定于领域的错误分类成本模型相结合，解决了未标记软件检测数据集的问题。该预处理步骤通过估计模块的分类标签有选择地对一小部分模块进行采样。其次，CSDP建立了一个成本敏感的支持向量机模型，以总体分类错误率和特定领域的误分类成本作为质量指标来预测其余模块的缺陷倾向。 CSDP在四个NASA项目上进行了评估。实验结果突出了三个有趣的观察结果：（1）与带有有限标签的不平衡训练数据的最新监督学习模型相比，CSDP实现了更高的归一化期望误分类成本（NECM）。（2）CSDP优于最新的半监督学习方法，该方法忽略了分类成本，尤其是召回率。（3）通过在训练和预测之前进行的无监督采样作为预处理步骤，CSDP的性能优于没有采样过程的基线CSDP。

著录项

来源
《2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science》|2018年|256-263|共8页
会议地点 Salt Lake City(US)
作者
Ling Xu; Bei Wang; Ling Liu; Mo Zhou; Shengping Liao; Meng Yan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Software; Measurement; Training; Predictive models; Training data; Cancer; Sampling methods;

机译：软件;测量;培训;预测模型;培训数据;癌症;抽样方法;;

相似文献

外文文献
中文文献
专利

1. Discriminating features-based cost-sensitive approach for software defect prediction [J] . Aftab Ali, Naveed Khan, Mamun Abu-Tair, Automated software engineering . 2021,第2期

机译：用于软件缺陷预测的基于特征的成本敏感方法
2. CSSG: A cost-sensitive stacked generalization approach for software defect prediction [J] . Eivazpour Zeinab, Keyvanpour Mohammad Reza Software Testing, Verification and Reliability . 2021,第5期

机译：CSSG：用于软件缺陷预测的成本敏感的综合概括方法
3. Cost-sensitive Dictionary Learning for Software Defect Prediction [J] . Liang Niu, Jianwu Wan, Hongyuan Wang, Neural processing letters . 2020,第3期

机译：软件缺陷预测的成本敏感词典学习
4. Misclassification Cost-Sensitive Software Defect Prediction [C] . Ling Xu, Bei Wang, Ling Liu, IEEE International Conference on Information Reuse and Integration . 2018

机译：错误分类成本敏感的软件缺陷预测
5. A Software Metrics Clustering Approach to Cross-Project Defect Prediction [D] . Sezer, Anil. 2019

机译：交叉项目缺陷预测的软件度量聚类方法
6. Cost-Sensitive Radial Basis Function Neural Network Classifier for Software Defect Prediction [O] . P. Kumudha, R. Venkatesan 2016

机译：成本敏感的径向基函数神经网络分类器用于软件缺陷预测
7. Cost-Sensitive and Sparse Ladder Network for Software Defect Prediction [O] . Jing SUN, Yi-mu JI, Shangdong LIU, 2020

机译：用于软件缺陷预测的成本敏感和稀疏梯形网络

Misclassification Cost-Sensitive Software Defect Prediction

摘要

著录项

相似文献

相关主题

期刊订阅