首页> 外文会议>International Symposium on Innovation in Information and Communication Technology >Implementation of Data Sampling in Class Imbalance Learning for Cross Project Defect Prediction: An Empirical Study
【24h】

Implementation of Data Sampling in Class Imbalance Learning for Cross Project Defect Prediction: An Empirical Study

机译:跨项目缺陷预测的班级失衡学习中数据采样的实现:一项实证研究

获取原文
获取外文期刊封面目录资料

摘要

Cross-project defect prediction is a way in which the prediction model is trained by using the data sources of the different projects and then it is tested on the target data source. The data source from the different projects generates to a highly imbalanced source dataset. There exists an imbalance between the defect prone and non defect prone classes. This in turn degrades the performance of the predictive model. This paper performs an empirical analysis in a two-fold manner. Firstly it evaluates whether data sampling using SMOTE algorithm can improve the performance of the predictive model for different categories of cross project defect prediction (CPDP). Secondly, it also ensures whether this technique applied in CPDP is comparable to within project defect prediction (WPDP). Ensemble learning classifiers i.e. Gradient Boosting is used as the predictive model over 16 publically available datasets. The experimental results infer that SMOTE algorithm can be applied to overcome the problem of class imbalance on different categories of CPDP. Besides this it gives comparable results to WPDP with statistical significance.
机译:跨项目缺陷预测是一种通过使用不同项目的数据源训练预测模型,然后在目标数据源上对其进行测试的方法。来自不同项目的数据源生成了高度不平衡的源数据集。在倾向于缺陷的类别和非倾向于缺陷的类别之间存在不平衡。这反过来会降低预测模型的性能。本文以两种方式进行了实证分析。首先,它评估了使用SMOTE算法进行数据采样是否可以提高针对不同类别的交叉项目缺陷预测(CPDP)的预测模型的性能。其次,它还确保了CPDP中应用的这种技术是否可与项目缺陷预测(WPDP)中的技术相媲美。集合学习分类器,即梯度提升被用作16个公共可用数据集的预测模型。实验结果表明,SMOTE算法可以克服CPDP不同类别的类不平衡问题。除此之外,它还提供了与WPDP相当的结果,具有统计意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号