Implementation of Data Sampling in Class Imbalance Learning for Cross Project Defect Prediction: An Empirical Study

机译：跨项目缺陷预测的班级失衡学习中数据采样的实现：一项实证研究

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Cross-project defect prediction is a way in which the prediction model is trained by using the data sources of the different projects and then it is tested on the target data source. The data source from the different projects generates to a highly imbalanced source dataset. There exists an imbalance between the defect prone and non defect prone classes. This in turn degrades the performance of the predictive model. This paper performs an empirical analysis in a two-fold manner. Firstly it evaluates whether data sampling using SMOTE algorithm can improve the performance of the predictive model for different categories of cross project defect prediction (CPDP). Secondly, it also ensures whether this technique applied in CPDP is comparable to within project defect prediction (WPDP). Ensemble learning classifiers i.e. Gradient Boosting is used as the predictive model over 16 publically available datasets. The experimental results infer that SMOTE algorithm can be applied to overcome the problem of class imbalance on different categories of CPDP. Besides this it gives comparable results to WPDP with statistical significance.

机译：跨项目缺陷预测是一种通过使用不同项目的数据源训练预测模型，然后在目标数据源上对其进行测试的方法。来自不同项目的数据源生成了高度不平衡的源数据集。在倾向于缺陷的类别和非倾向于缺陷的类别之间存在不平衡。这反过来会降低预测模型的性能。本文以两种方式进行了实证分析。首先，它评估了使用SMOTE算法进行数据采样是否可以提高针对不同类别的交叉项目缺陷预测（CPDP）的预测模型的性能。其次，它还确保了CPDP中应用的这种技术是否可与项目缺陷预测（WPDP）中的技术相媲美。集合学习分类器，即梯度提升被用作16个公共可用数据集的预测模型。实验结果表明，SMOTE算法可以克服CPDP不同类别的类不平衡问题。除此之外，它还提供了与WPDP相当的结果，具有统计意义。

著录项

来源
《International Symposium on Innovation in Information and Communication Technology》|2018年|1-6|共6页
会议地点 Amman(JO)
作者
Lipika Goel; Mayank Sharma; Sunil Kumar Khatri; D. Damodaran;
展开▼
作者单位

Research Scholar Amity University Noida India;

Amity Institute of Information Technology Amity University Noida India;

Centre for Reliability Thiruvanmiyur Chennai India;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Predictive models; Java; Boosting; Training; Data models; Testing; Measurement;

机译：预测模型； Java;助推;训练;数据模型；测试；测量;

相似文献

外文文献
中文文献
专利

1. Cross-project defect prediction using data sampling for class imbalance learning: an empirical study [J] . Goel Lipika, Sharma Mayank, Khatri Sunil Kumar, International Journal of Parallel, Emergent and Distributed Systems . 2021,第1a2期

机译：使用类别不平衡学习数据采样的跨项目缺陷预测：实证研究
2. A Novel Class-Imbalance Learning Approach for Both Within-Project and Cross-Project Defect Prediction [J] . IEEE Transactions on Reliability . 2020,第1期

机译：一种用于项目内和跨项目缺陷预测的新型班级不平衡学习方法
3. An Improved SDA Based Defect Prediction Framework for Both Within-Project and Cross-Project Class-Imbalance Problems [J] . Xiao-Yuan Jing, Fei Wu, Xiwei Dong, Software Engineering, IEEE Transactions on . 2017,第4期

机译：针对项目内和跨项目类不平衡问题的基于SDA的改进的缺陷预测框架
4. Implementation of Data Sampling in Class Imbalance Learning for Cross Project Defect Prediction: An Empirical Study [C] . Lipika Goel, Mayank Sharma, Sunil Kumar Khatri, International Symposium on Innovation in Information and Communication Technology . 2018

机译：交叉项目缺陷预测类别不平衡学习中数据采样的实现：实证研究
5. Active learning with support vector machines for imbalanced datasets and a method for stopping active learning based on stabilizing predictions. [D] . Bloodgood, Michael. 2009

机译：支持向量机用于不平衡数据集的主动学习，以及一种基于稳定预测的主动学习停止方法。
6. Software Defect Prediction for Healthcare Big Data: An Empirical Evaluation of Machine Learning Techniques [O] . Bilal Khan, Rashid Naseem, Muhammad Arif Shah, 2021

机译：医疗保健大数据的软件缺陷预测：机器学习技术的实证评价
7. An Empirical Study of Classifier Combination for Cross-Project Defect Prediction [O] . Yun Zhang, David Lo, Xin Xia, 2016

机译：跨项目缺陷预测的分类器组合实证研究

Implementation of Data Sampling in Class Imbalance Learning for Cross Project Defect Prediction: An Empirical Study

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅