首页> 外文期刊>Automated software engineering >Label propagation based semi-supervised learning for software defect prediction
【24h】

Label propagation based semi-supervised learning for software defect prediction

机译:基于标签传播的半监督学习,用于软件缺陷预测

获取原文
获取原文并翻译 | 示例
           

摘要

Software defect prediction can automatically predict defect-prone software modules for efficient software test in software engineering. When the previous defect labels of modules are limited, predicting the defect-prone modules becomes a challenging problem. In static software defect prediction, there exist the similarity among software modules, a software module can be approximated by a sparse representation of the other part of the software modules, and class-imbalance problem, the number of defect-free modules is much larger than that of defective ones. In this paper, we propose to use graph based semi-supervised learning technique to predict software defect. By using Laplacian score sampling strategy for the labeled defect-free modules, we construct a class-balance labeled training dataset firstly. And then, we use a nonnegative sparse algorithm to compute the nonnegative sparse weights of a relationship graph which serve as clustering indicators. Lastly, on the nonnegative sparse graph, we use a label propagation algorithm to iteratively predict the labels of unlabeled software modules. We thus propose a nonnegative sparse graph based label propagation approach for software defect classification and prediction, which uses not only few labeled data but also abundant unlabeled ones to improve the generalization capability. We vary the size of labeled software modules from 10 to 30% of all the datasets in the widely used NASA projects. Experimental results show that the NSGLP outperforms several representative state-of-the-art semi-supervised software defect prediction methods, and it can fully exploit the characteristics of static code metrics and improve the generalization capability of the software defect prediction model.
机译:软件缺陷预测可以自动预测容易出现缺陷的软件模块,以便在软件工程中进行有效的软件测试。当模块的先前缺陷标签受到限制时,预测易发生缺陷的模块将成为一个具有挑战性的问题。在静态软件缺陷预测中,软件模块之间存在相似性,可以用软件模块其他部分的稀疏表示来近似一个软件模块,并且存在类不平衡问题,无缺陷模块的数量比有缺陷的。在本文中,我们建议使用基于图的半监督学习技术来预测软件缺陷。通过对标记的无缺陷模块使用拉普拉斯分数采样策略,我们首先构造了一个类平衡标记的训练数据集。然后,我们使用非负稀疏算法来计算关系图的非负稀疏权重,该关系图用作聚类指标。最后,在非负稀疏图中,我们使用标签传播算法来迭代预测未标记软件模块的标签。因此,我们提出了一种基于非负稀疏图的标签传播方法,用于软件缺陷的分类和预测,该方法不仅使用少量标记数据,而且使用大量未标记数据来提高泛化能力。在广泛使用的NASA项目中,我们将带有标签的软件模块的大小从所有数据集的10%更改为30%。实验结果表明,NSGLP优于几种代表性的最新半监督软件缺陷预测方法,并且可以充分利用静态代码度量的特征并提高软件缺陷预测模型的泛化能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号