首页> 外文会议>International Symposium on Innovation in Information and Communication Technology >Implementation of Data Sampling in Class Imbalance Learning for Cross Project Defect Prediction: An Empirical Study
【24h】

Implementation of Data Sampling in Class Imbalance Learning for Cross Project Defect Prediction: An Empirical Study

机译:交叉项目缺陷预测类别不平衡学习中数据采样的实现:实证研究

获取原文

摘要

Cross-project defect prediction is a way in which the prediction model is trained by using the data sources of the different projects and then it is tested on the target data source. The data source from the different projects generates to a highly imbalanced source dataset. There exists an imbalance between the defect prone and non defect prone classes. This in turn degrades the performance of the predictive model. This paper performs an empirical analysis in a two-fold manner. Firstly it evaluates whether data sampling using SMOTE algorithm can improve the performance of the predictive model for different categories of cross project defect prediction (CPDP). Secondly, it also ensures whether this technique applied in CPDP is comparable to within project defect prediction (WPDP). Ensemble learning classifiers i.e. Gradient Boosting is used as the predictive model over 16 publically available datasets. The experimental results infer that SMOTE algorithm can be applied to overcome the problem of class imbalance on different categories of CPDP. Besides this it gives comparable results to WPDP with statistical significance.
机译:交叉项目缺陷预测是通过使用不同项目的数据源培训预测模型的方式,然后在目标数据源上进行测试。来自不同项目的数据源生成到高度不平衡的源数据集。易于缺陷和无缺陷易于等级之间存在不平衡。这反过来降低了预测模型的性能。本文以双倍的方式进行实证分析。首先,它评估是否使用Smote算法进行数据采样可以提高不同类别的交叉项目缺陷预测(CPDP)的预测模型的性能。其次,它还确保在CPDP中应用此技术是否与项目缺陷预测(WPDP)相当。集合学习分类器I.E.梯度升压用作超过16个公共可用数据集的预测模型。可以应用Smote算法的实验结果推断,以克服不同类别的CPDP类别不平衡问题。除此之外,它还使WPDP具有统计显着性的可比结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号