首页> 外文期刊>Automated software engineering >Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction
【24h】

Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction

机译:成本敏感的传输核规范相关分析用于异构缺陷预测

获取原文
获取原文并翻译 | 示例
           

摘要

Cross-project defect prediction (CPDP) refers to predicting defects in a target project using prediction models trained from historical data of other source projects. And CPDP in the scenario where source and target projects have different metric sets is called heterogeneous defect prediction (HDP). Recently, HDP has received much research interest. Existing HDP methods only consider the linear correlation relationship among the features (metrics) of the source and target projects, and such models are insufficient to evaluate nonlinear correlation relationship among the features. So these methods may suffer from the linearly inseparable problem in the linear feature space. Furthermore, existing HDP methods do not take the class imbalance problem into consideration. Unfortunately, the imbalanced nature of software defect datasets increases the learning difficulty for the predictors. In this paper, we propose a new cost-sensitive transfer kernel canonical correlation analysis (CTKCCA) approach for HDP. CTKCCA can not only make the data distributions of source and target projects much more similar in the nonlinear feature space, where the learned features have favorable separability, but also utilize the different misclassification costs for defective and defect-free classes to alleviate the class imbalance problem. We perform the Friedman test with Nemenyi’s post-hoc statistical test and the Cliff’s delta effect size test for the evaluation. Extensive experiments on 28 public projects from five data sources indicate that: (1) CTKCCA significantly performs better than the related CPDP methods; (2) CTKCCA performs better than the related state-of-the-art HDP methods.
机译:跨项目缺陷预测(CPDP)是指使用从其他源项目的历史数据训练而来的预测模型来预测目标项目中的缺陷。在源项目和目标项目具有不同度量标准集的情况下,CPDP称为异构缺陷预测(HDP)。最近,HDP引起了很多研究兴趣。现有的HDP方法仅考虑源项目和目标项目的特征(度量)之间的线性相关关系,而这样的模型不足以评估特征之间的非线性相关关系。因此,这些方法可能会遇到线性特征空间中线性不可分的问题。此外,现有的HDP方法没有考虑类不平衡问题。不幸的是,软件缺陷数据集的不平衡特性增加了预测变量的学习难度。在本文中,我们为HDP提出了一种新的成本敏感型传输内核规范相关分析(CTKCCA)方法。 CTKCCA不仅可以使非线性特征空间中的源项目和目标项目的数据分布更加相似,在学习的特征具有良好的可分离性的非线性特征空间中,而且还可以利用有缺陷和无缺陷类别的不同分类错误成本来缓解类别不平衡问题。 。我们将弗里德曼检验与Nemenyi的事后统计检验和Cliff的效应增量检验一起进行评估。来自五个数据源的28个公共项目的广泛实验表明:(1)CTKCCA的性能明显优于相关的CPDP方法; (2)CTKCCA的性能优于相关的最新HDP方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号