Implementation of Data Sampling in Class Imbalance Learning for Cross Project Defect Prediction: An Empirical Study

机译：交叉项目缺陷预测类别不平衡学习中数据采样的实现：实证研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Cross-project defect prediction is a way in which the prediction model is trained by using the data sources of the different projects and then it is tested on the target data source. The data source from the different projects generates to a highly imbalanced source dataset. There exists an imbalance between the defect prone and non defect prone classes. This in turn degrades the performance of the predictive model. This paper performs an empirical analysis in a two-fold manner. Firstly it evaluates whether data sampling using SMOTE algorithm can improve the performance of the predictive model for different categories of cross project defect prediction (CPDP). Secondly, it also ensures whether this technique applied in CPDP is comparable to within project defect prediction (WPDP). Ensemble learning classifiers i.e. Gradient Boosting is used as the predictive model over 16 publically available datasets. The experimental results infer that SMOTE algorithm can be applied to overcome the problem of class imbalance on different categories of CPDP. Besides this it gives comparable results to WPDP with statistical significance.

机译：交叉项目缺陷预测是通过使用不同项目的数据源培训预测模型的方式，然后在目标数据源上进行测试。来自不同项目的数据源生成到高度不平衡的源数据集。易于缺陷和无缺陷易于等级之间存在不平衡。这反过来降低了预测模型的性能。本文以双倍的方式进行实证分析。首先，它评估是否使用Smote算法进行数据采样可以提高不同类别的交叉项目缺陷预测（CPDP）的预测模型的性能。其次，它还确保在CPDP中应用此技术是否与项目缺陷预测（WPDP）相当。集合学习分类器I.E.梯度升压用作超过16个公共可用数据集的预测模型。可以应用Smote算法的实验结果推断，以克服不同类别的CPDP类别不平衡问题。除此之外，它还使WPDP具有统计显着性的可比结果。

著录项

来源
《International Symposium on Innovation in Information and Communication Technology》|2018年|64p|共6页
会议地点
作者
Lipika Goel; Mayank Sharma; Sunil Kumar Khatri; D. Damodaran;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN91-53;
关键词
Predictive models; Java; Boosting; Training; Data models; Testing; Measurement;

机译：预测模型;Java;提升;培训;数据模型;测试;测量;

相似文献

外文文献
中文文献
专利

1. Cross-project defect prediction using data sampling for class imbalance learning: an empirical study [J] . Goel Lipika, Sharma Mayank, Khatri Sunil Kumar, International Journal of Parallel, Emergent and Distributed Systems . 2021,第1a2期

机译：使用类别不平衡学习数据采样的跨项目缺陷预测：实证研究
2. A Novel Class-Imbalance Learning Approach for Both Within-Project and Cross-Project Defect Prediction [J] . IEEE Transactions on Reliability . 2020,第1期

机译：一种用于项目内和跨项目缺陷预测的新型班级不平衡学习方法
3. An Improved SDA Based Defect Prediction Framework for Both Within-Project and Cross-Project Class-Imbalance Problems [J] . Xiao-Yuan Jing, Fei Wu, Xiwei Dong, Software Engineering, IEEE Transactions on . 2017,第4期

机译：针对项目内和跨项目类不平衡问题的基于SDA的改进的缺陷预测框架
4. Implementation of Data Sampling in Class Imbalance Learning for Cross Project Defect Prediction: An Empirical Study [C] . Lipika Goel, Mayank Sharma, Sunil Kumar Khatri, International Symposium on Innovation in Information and Communication Technology . 2018

机译：跨项目缺陷预测的班级失衡学习中数据采样的实现：一项实证研究
5. Active learning with support vector machines for imbalanced datasets and a method for stopping active learning based on stabilizing predictions. [D] . Bloodgood, Michael. 2009

机译：支持向量机用于不平衡数据集的主动学习，以及一种基于稳定预测的主动学习停止方法。
6. Software Defect Prediction for Healthcare Big Data: An Empirical Evaluation of Machine Learning Techniques [O] . Bilal Khan, Rashid Naseem, Muhammad Arif Shah, 2021

机译：医疗保健大数据的软件缺陷预测：机器学习技术的实证评价
7. An Empirical Study of Classifier Combination for Cross-Project Defect Prediction [O] . Yun Zhang, David Lo, Xin Xia, 2016

机译：跨项目缺陷预测的分类器组合实证研究

Implementation of Data Sampling in Class Imbalance Learning for Cross Project Defect Prediction: An Empirical Study

摘要

著录项

相似文献

相关主题

期刊订阅