首页> 外文期刊>Applied Sciences >Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach
【24h】

Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach

机译:软件缺陷预测中特征选择方法的性能分析:一种搜索方法方法

获取原文
       

摘要

Software Defect Prediction (SDP) models are built using software metrics derived from software systems. The quality of SDP models depends largely on the quality of software metrics (dataset) used to build the SDP models. High dimensionality is one of the data quality problems that affect the performance of SDP models. Feature selection (FS) is a proven method for addressing the dimensionality problem. However, the choice of FS method for SDP is still a problem, as most of the empirical studies on FS methods for SDP produce contradictory and inconsistent quality outcomes. Those FS methods behave differently due to different underlining computational characteristics. This could be due to the choices of search methods used in FS because the impact of FS depends on the choice of search method. It is hence imperative to comparatively analyze the FS methods performance based on different search methods in SDP. In this paper, four filter feature ranking (FFR) and fourteen filter feature subset selection (FSS) methods were evaluated using four different classifiers over five software defect datasets obtained from the National Aeronautics and Space Administration (NASA) repository. The experimental analysis showed that the application of FS improves the predictive performance of classifiers and the performance of FS methods can vary across datasets and classifiers. In the FFR methods, Information Gain demonstrated the greatest improvements in the performance of the prediction models. In FSS methods, Consistency Feature Subset Selection based on Best First Search had the best influence on the prediction models. However, prediction models based on FFR proved to be more stable than those based on FSS methods. Hence, we conclude that FS methods improve the performance of SDP models, and that there is no single best FS method, as their performance varied according to datasets and the choice of the prediction model. However, we recommend the use of FFR methods as the prediction models based on FFR are more stable in terms of predictive performance.
机译:软件缺陷预测(SDP)模型使用来自软件系统的软件指标构建。 SDP模型的质量在很大程度上取决于用于构建SDP模型的软件度量标准(数据集)的质量。高维度是影响SDP模型性能的数据质量问题之一。特征选择(FS)是一种解决维度问题的经过验证的方法。然而,FS方法对于SDP的选择仍然是一个问题,因为大多数对SDP的FS方法的大多数实证研究产生了矛盾和不一致的质量结果。由于不同的下划线计算特征,那些FS方法的行为不同。这可能是由于FS中使用的搜索方法的选择,因为FS的影响取决于搜索方法的选择。因此,必须基于SDP中的不同搜索方法进行比较分析FS方法性能。在本文中,使用四种不同的分类器在从美国国家航空和空间管理(NASA)存储库中获得的五个软件缺陷数据集中,评估了四个过滤器特征排名(FFR)和十四滤波器特征子集选择(FFS)方法。实验分析表明,FS的应用提高了分类器的预测性能,FS方法的性能可以各不相差地变跨数据集和分类器。在FFR方法中,信息增益显示了预测模型性能的最大改进。在FSS方法中,基于最佳第一搜索的一致性特征子集选择对预测模型具有最佳影响。然而,基于FFR的预测模型被证明比基于FSS方法更稳定。因此,我们得出结论,FS方法改善了SDP模型的性能,并且没有单一最佳的FS方法,因为它们的性能根据数据集和预测模型的选择而变化。但是,我们建议使用FFR方法作为基于FFR的预测模型在预测性能方面更稳定。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号