...
首页> 外文期刊>International Journal of Reliability, Quality and Safety Engineering >WRAPPER-BASED FEATURE RANKING TECHNIQUES FOR DETERMINING RELEVANCE OF SOFTWARE ENGINEERING METRICS
【24h】

WRAPPER-BASED FEATURE RANKING TECHNIQUES FOR DETERMINING RELEVANCE OF SOFTWARE ENGINEERING METRICS

机译:用于确定软件工程指标相关性的基于包装器的特征排序技术

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Classification, an important data mining function that assigns class label to items in a collection, is of practical applications in various domains. In software engineering, for instance, a common classification problem is to determine the quality of a software item. In such a problem, software metrics represent the independent features while the fault proneness represents the class label. With many classification problems, one must often deal with the presence of irrelevant features in the feature space. That, coupled with class imbalance, renders the task of discriminating one class from another rather difficult. In this study, we empirically evaluate our proposed wrapper-based feature ranking where nine performance metrics aided by a particular learner and a methodology are considered. We examine five learners and take three different approaches, each in conjunction with one of three different methodologies: 3-fold Cross-Validation, 3-fold Cross-Validation Risk Impact, and a combination of the two. In this study, we consider two sets of software engineering datasets. To evaluate the classifier performance after feature selection has been applied, we use Area Under Receiver Operating Characteristic curve as the performance evaluator. We investigate the performance of feature selection as we vary the three factors that form the foundation of the wrapper-based feature ranking. We show that the performance is conditioned by not only the choice of methodology but also the learner. We also evaluate the effect of sampling on wrapper-based feature ranking. Finally, we provide guidance as to which software metrics are relevant in software defect prediction problems and how the number of software metrics can be selected when using wrapper-based feature ranking.
机译:分类是一种重要的数据挖掘功能,可将类别标签分配给集合中的项目,在各个领域都有实际应用。例如,在软件工程中,常见的分类问题是确定软件项目的质量。在这样的问题中,软件指标代表独立的功能,而故障倾向代表类标签。对于许多分类问题,必须经常处理特征空间中不相关特征的存在。这加上阶级的不平衡,使区分一个阶级与另一个阶级的任务变得相当困难。在这项研究中,我们根据经验评估了我们提出的基于包装的特征排名,其中考虑了由特定学习者和方法论辅助的九种性能指标。我们检查了五个学习者,并采取三种不同的方法,每种方法都与三种不同的方法之一结合:3倍交叉验证,3倍交叉验证风险影响以及两者的结合。在这项研究中,我们考虑了两组软件工程数据集。为了在应用特征选择之后评估分类器的性能,我们使用“接收器工作区域下的特征曲线”作为性能评估器。当我们改变构成基于包装器的特征排名基础的三个因素时,我们将研究特征选择的性能。我们表明,绩效不仅取决于方法论的选择,而且还取决于学习者。我们还评估了采样对基于包装的特征排名的影响。最后,我们提供有关哪些软件指标与软件缺陷预测问题相关的指南,以及在使用基于包装的特征排名时如何选择软件指标的数量的指南。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号