首页> 外文期刊>International journal of software engineering and knowledge engineering >Software Defect Prediction Based on Cost-Sensitive Dictionary Learning
【24h】

Software Defect Prediction Based on Cost-Sensitive Dictionary Learning

机译:基于代价敏感字典学习的软件缺陷预测

获取原文
获取原文并翻译 | 示例
           

摘要

Software defect prediction technology has been widely used in improving the quality of software system. Most real software defect datasets tend to have fewer defective modules than defective-free modules. Highly class-imbalanced data typically make accurate predictions difficult. The imbalanced nature of software defect datasets makes the prediction model classifying a defective module as a defective-free one easily. As there exists the similarity during the different software modules, one module can be represented by the sparse representation coefficients over the pre-defined dictionary which consists of historical software defect datasets. In this study, we make use of dictionary learning method to predict software defect. We optimize the classifier parameters and the dictionary atoms iteratively, to ensure that the extracted features (sparse representation) are optimal for the trained classifier. We prove the optimal condition of the elastic net which is used to solve the sparse coding coefficients and the regularity of the elastic net solution. Due to the reason that the misclassification of defective modules generally incurs much higher cost risk than the misclassification of defective-free ones, we take the different misclassification costs into account, increasing the punishment on misclassification defective modules in the procedure of dictionary learning, making the classification inclining to classify a module as a defective one. Thus, we propose a cost-sensitive software defect prediction method using dictionary learning (CSDL). Experimental results on the 10 class-imbalance datasets of NASA show that our method is more effective than several typical state-of-the-art defect prediction methods.
机译:软件缺陷预测技术已被广泛用于提高软件系统的质量。大多数实际软件缺陷数据集中的缺陷模块往往少于无缺陷模块。高度不平衡的数据通常会使准确的预测变得困难。软件缺陷数据集的不平衡特性使预测模型很容易将缺陷模块分类为无缺陷模块。由于不同软件模块之间存在相似性,因此一个模块可以由预定义词典上的稀疏表示系数表示,该词典由历史软件缺陷数据集组成。在这项研究中,我们利用字典学习方法来预测软件缺陷。我们迭代地优化分类器参数和字典原子,以确保提取的特征(稀疏表示)对于经过训练的分类器而言是最佳的。证明了用于求解稀疏编码系数和弹性网解规则性的弹性网的最优条件。由于缺陷模块的错误分类通常比无缺陷模块的错误分类产生更高的成本风险,因此我们考虑了不同的错误分类成本,从而加大了词典学习过程中对错误模块分类的惩罚。分类倾向于将模块分类为有缺陷的模块。因此,我们提出了一种使用字典学习(CSDL)的成本敏感型软件缺陷预测方法。在NASA的10个类别不平衡数据集上的实验结果表明,我们的方法比几种典型的最新缺陷预测方法更有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号