首页> 外文期刊>IEEE Transactions on Reliability >Empirical Studies of a Two-Stage Data Preprocessing Approach for Software Fault Prediction
【24h】

Empirical Studies of a Two-Stage Data Preprocessing Approach for Software Fault Prediction

机译:软件故障预测的两阶段数据预处理方法的实证研究

获取原文
获取原文并翻译 | 示例

摘要

Software fault prediction is a valuable exercise in software quality assurance to best allocate limited testing resources. Classification is one of the effective methods for software fault prediction. The classification models are trained based on the datasets obtained by mining software historical repositories. However, the performance of the models depends on the quality of datasets. In this paper, we propose a novel two-stage data preprocessing approach which incorporates both feature selection and instance reduction. Specifically, in the feature selection stage, we first perform relevance analysis, and then propose a threshold-based clustering method, called novel threshold-based clustering algorithm, to conduct redundancy control. In the instance reduction stage, we apply random under-sampling to keep the balance between the faulty and non-faulty instances. In empirical studies, we chose datasets from real-world software projects, such as Eclipse and NASA. Then we compared our approach with some classical baseline methods, and further investigated the influencing factors in our approach. The final results demonstrate the effectiveness of our approach, and provide a guideline for achieving cost-effective data preprocessing when using our two-stage approach.
机译:软件故障预测是软件质量保证中的最佳做法,可以最佳地分配有限的测试资源。分类是软件故障预测的有效方法之一。基于挖掘软件历史存储库获得的数据集训练分类模型。但是,模型的性能取决于数据集的质量。在本文中,我们提出了一种新颖的两阶段数据预处理方法,该方法结合了特征选择和实例约简。具体而言,在特征选择阶段,我们首先进行相关性分析,然后提出一种基于阈值的聚类方法,称为新颖的基于阈值的聚类算法,以进行冗余控制。在实例缩减阶段,我们应用随机欠采样以保持故障实例与非故障实例之间的平衡。在实证研究中,我们从现实世界的软件项目(例如Eclipse和NASA)中选择了数据集。然后,我们将我们的方法与一些经典的基线方法进行了比较,并进一步研究了我们方法中的影响因素。最终结果证明了我们方法的有效性,并为使用我们的两阶段方法时实现具有成本效益的数据预处理提供了指南。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号