首页> 外文会议> >Applying Novel Resampling Strategies To Software Defect Prediction
【24h】

Applying Novel Resampling Strategies To Software Defect Prediction

机译:将新颖的重采样策略应用于软件缺陷预测

获取原文

摘要

Due to the tremendous complexity and sophistication of software, improving software reliability is an enormously difficult task. We study the software defect prediction problem, which focuses on predicting which modules will experience a failure during operation. Numerous studies have applied machine learning to software defect prediction; however, skewness in defect-prediction datasets usually undermines the learning algorithms. The resulting classifiers will often never predict the faulty (minority0 class. This problem is well known in machine learning and is often referred to as learning from imbalanced datasets. We examine stratification, a widely used technique for learning imbalanced data that has received little attention in software defect prediction. Our experiments are focused on the SMOTE technique, which is a method of over-sampling minority-class examples. Our goal is to determine if SMOTE can improve recognition of defect-prone modules, and at what cost. Our experiments demonstrate that after SMOTE resampling, we have a more balanced classification. We found an improvement of at least 23% in the average geometric mean classification accuracy on four benchmark datasets.
机译:由于软件的高度复杂性和复杂性,提高软件可靠性是一项极为困难的任务。我们研究软件缺陷预测问题,其重点是预测操作过程中哪些模块将发生故障。许多研究已经将机器学习应用于软件缺陷预测。但是,缺陷预测数据集中的偏斜通常会破坏学习算法。结果分类器将永远无法预测故障(minority0类)。该问题在机器学习中众所周知,通常被称为从不平衡数据集学习。软件缺陷预测:我们的实验集中在SMOTE技术上,SMOTE技术是对少数族裔示例进行过度采样的一种方法,我们的目标是确定SMOTE是否可以提高易发缺陷模块的识别能力,以及降低成本的方法。在进行SMOTE重采样后,我们得到了更加平衡的分类,发现四个基准数据集的平均几何平均分类精度至少提高了23%。

著录项

  • 来源
    《 》|2007年|69-72|共4页
  • 会议地点
  • 作者

    Pelayo; Lourdes; Dick; Scott;

  • 作者单位
  • 会议组织
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号