Impact of Data Sampling on Feature Selection Techniques for Software Defect Prediction

机译：数据采样对用于软件缺陷预测的特征选择技术的影响

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In software quality modeling, two problems often come with a software training dataset: (1) high dimensionality and (2) imbalanced distributions between the two classes (fault-prone and not-fault-prone modules). To overcome these problems, an effective method is to perform feature selection and data sampling prior to building classifiers for software quality prediction. In this study, we investigate 18 filter-based feature ranking techniques and three data sampling approaches, and compare the similarity between each pair of filters with respect to different sampling techniques. We also compare the prediction performance when using every combination of filter and sampling method. The experimental results demonstrate that data sampling increases the similarity between two feature ranking techniques on average and improves the classification performance when combined with feature selection approaches.

机译：在软件质量建模中，软件培训数据集经常会出现两个问题：（1）高维和（2）两个类（易错模块和不易错模块）之间的分布不平衡。为了克服这些问题，一种有效的方法是在构建用于软件质量预测的分类器之前执行特征选择和数据采样。在这项研究中，我们研究了18种基于过滤器的特征排名技术和三种数据采样方法，并针对不同的采样技术比较了每对过滤器之间的相似性。当使用滤波器和采样方法的每种组合时，我们还比较了预测性能。实验结果表明，与特征选择方法结合使用时，数据采样平均提高了两种特征排序技术之间的相似度，并提高了分类性能。

著录项

来源
《ISSAT international conference on reliability quality in design》|2012年|91-95|共5页
会议地点
作者
Kehan Gao; Taghi M. Khoshgoftaar; Amri Napolitano;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An Empirical Investigation of Combining Filter-Based Feature Subset Selection and Data Sampling for Software Defect Prediction [J] . Kehan Gao, Taghi M. Khoshgoftaar, Amri Napolitano International Journal of Reliability, Quality and Safety Engineering . 2015,第6期

机译：基于滤波器的特征子集选择和数据采样相结合进行软件缺陷预测的实证研究
2. ELM and KELM based software defect prediction using feature selection techniques [J] . Ishani Arora, Anju Saha Journal of Information & Optimization Sciences . 2019,第5期

机译：使用特征选择技术的基于ELM和KELM的软件缺陷预测
3. A comparative study of iterative and non-iterative feature selection techniques for software defect prediction [J] . Taghi M. Khoshgoftaar, Kehan Gao, Amri Napolitano, Information systems frontiers . 2014,第5期

机译：迭代与非迭代特征选择技术在软件缺陷预测中的比较研究
4. Impact of Data Sampling on Feature Selection Techniques for Software Defect Prediction [C] . Kehan Gao, Taghi M. Khoshgoftaar, Amri Napolitano ISSAT international conference on reliability quality in design . 2012

机译：数据采样对软件缺陷预测特征选择技术的影响
5. Improve Software Defect Estimation with Six Sigma Defect Measures: Empirical Studies with Imputation Techniques on ISBSG Data Repository with a High Ratio of Missing Data [D] . Almakadmeh, Mhammed. 2017

机译：提高六种Sigma缺陷措施的软件缺陷估算：具有高比例的ISBSG数据储存中缺货技术的实证研究
6. Software Defect Prediction for Healthcare Big Data: An Empirical Evaluation of Machine Learning Techniques [O] . Bilal Khan, Rashid Naseem, Muhammad Arif Shah, 2021

机译：医疗保健大数据的软件缺陷预测：机器学习技术的实证评价
7. Software Defect Prediction Based on Data Sampling and Multivariate Filter Feature Selection [O] . Yating Lin, Yiwen Zhong 2018

机译：基于数据采样和多变量滤波器特征选择的软件缺陷预测

Impact of Data Sampling on Feature Selection Techniques for Software Defect Prediction

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅