首页> 外文会议>ISSAT international conference on reliability quality in design >Impact of Data Sampling on Feature Selection Techniques for Software Defect Prediction
【24h】

Impact of Data Sampling on Feature Selection Techniques for Software Defect Prediction

机译:数据采样对用于软件缺陷预测的特征选择技术的影响

获取原文

摘要

In software quality modeling, two problems often come with a software training dataset: (1) high dimensionality and (2) imbalanced distributions between the two classes (fault-prone and not-fault-prone modules). To overcome these problems, an effective method is to perform feature selection and data sampling prior to building classifiers for software quality prediction. In this study, we investigate 18 filter-based feature ranking techniques and three data sampling approaches, and compare the similarity between each pair of filters with respect to different sampling techniques. We also compare the prediction performance when using every combination of filter and sampling method. The experimental results demonstrate that data sampling increases the similarity between two feature ranking techniques on average and improves the classification performance when combined with feature selection approaches.
机译:在软件质量建模中,软件培训数据集经常会出现两个问题:(1)高维和(2)两个类(易错模块和不易错模块)之间的分布不平衡。为了克服这些问题,一种有效的方法是在构建用于软件质量预测的分类器之前执行特征选择和数据采样。在这项研究中,我们研究了18种基于过滤器的特征排名技术和三种数据采样方法,并针对不同的采样技术比较了每对过滤器之间的相似性。当使用滤波器和采样方法的每种组合时,我们还比较了预测性能。实验结果表明,与特征选择方法结合使用时,数据采样平均提高了两种特征排序技术之间的相似度,并提高了分类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号