首页> 外文会议>World Congress on Engineering and Computer Science >Feature Selection Using Decision Tree Induction in Class level Metrics Dataset for Software Defect Predictions
【24h】

Feature Selection Using Decision Tree Induction in Class level Metrics Dataset for Software Defect Predictions

机译:用于软件缺陷预测的类级度量数据集中的决策树诱导功能选择

获取原文

摘要

The importance of software testing for quality assurance cannot be over emphasized. The estimation of quality factors is important for minimizing the cost and improving the effectiveness of the software testing process. One of the quality factors is fault proneness, for which unfortunately there is no generalized technique available to effectively identify fault proneness. Many researchers have concentrated on how to select software metrics that are likely to indicate fault proneness. At the same time dimensionality reduction (feature selection of software metrics) also plays a vital role for the effectiveness of the model or best quality model. Feature selection is important for a variety of reasons such as generalization, performance, computational efficiency and feature interpretability. In this paper a new method for feature selection is proposed based on Decision Tree Induction. Relevant features are selected from the class level dataset based on decision tree classifiers used in the classification process. The attributes which form rules for the classifiers are taken as the relevant feature set or new feature set named Decision Tree Induction Rule based (DTIRB) feature set. Different classifiers are learned with this new data set obtained by decision tree induction process and achieved better performance. The performance of 18 classifiers is studied with the proposed method. Comparison is made with the Support Vector Machines (SVM) and RELIEF feature selection techniques. It is observed that the proposed method outperforms the other two for most of the classifiers considered. Overall improvement in classification process is also found with original feature set and reduced feature set. The proposed method has the advantage of easy interpretability and comprehensibility. Class level metrics dataset is used for evaluating the performance of the model. Receiver Operating Characteristics (ROC) and Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) error measures are used as the performance measures for checking effectiveness of the model.
机译:软件测试对质量保证的重要性不能强调。质量因素的估计对于最大限度地减少成本并提高软件测试过程的有效性是重要的。其中一个质量因素是错误的典范,不幸的是没有可用于有效识别故障的广义技术。许多研究人员集中了如何选择可能表示故障典范性的软件度量标准。同时减少维度减少(软件指标的特征选择)也对模型或最佳质量模型的有效性起着至关重要的作用。特征选择对于各种原因,诸如泛化,性能,计算效率和特征可解释性的各种原因非常重要。本文基于决策树诱导提出了一种新的特征选择方法。基于分类过程中使用的决策树分类器,从类级数据集中选中相关功能。将分类器规则的属性被视为基于决策树归纳规则的相关功能集或新功能集或新功能集。通过决策树感应过程获得的这种新数据集并实现了更好的性能,从而学习不同的分类器。使用该方法研究了18分类机的性能。使用支持向量机(SVM)和浮雕特征选择技术进行比较。观察到,对于大多数考虑的大多数分类器,所提出的方法优于其他两个。还发现了原始功能集和减少功能集的分类过程的总体改进。所提出的方法具有简单的可解释性和可理解性的优点。类级度量数据集用于评估模型的性能。接收器操作特性(ROC)和平均绝对误差(MAE)和根均方误差(RMSE)误差措施用作检查模型效能的性能措施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号