首页> 外文期刊>International Journal of Information Technology and Computer Science >Performance Analysis of Resampling Techniques on Class Imbalance Issue in Software Defect Prediction
【24h】

Performance Analysis of Resampling Techniques on Class Imbalance Issue in Software Defect Prediction

机译:软件缺陷预测中类不平衡问题重采样技术的性能分析

获取原文
           

摘要

Predicting the defects at early stage of software development life cycle can improve the quality of end product at lower cost. Machine learning techniques have been proved to be an effective way for software defect prediction however an imbalance dataset of software defects is the main issue of lower and biased performance of classifiers. This issue can be resolved by applying the re-sampling methods on software defect dataset before the classification process. This research analyzes the performance of three widely used resampling techniques on class imbalance issue for software defect prediction. The resampling techniques include: “Random Under Sampling”, “Random Over Sampling” and “Synthetic Minority Oversampling Technique (SMOTE)”. For experiments, 12 publically available cleaned NASA MDP datasets are used with 10 widely used supervised machine learning classifiers. The performance is evaluated through various measures including: F-measure, Accuracy, MCC and ROC. According to results, most of the classifiers performed better with “Random Over Sampling” technique in many datasets.
机译:在软件开发生命周期的早期阶段预测缺陷可以以较低的成本提高最终产品的质量。机器学习技术已被证明是预测软件缺陷的有效方法,但是软件缺陷的不平衡数据集是分类器性能较低且存在偏见的主要问题。通过在分类过程之前对软件缺陷数据集应用重新采样方法,可以解决此问题。这项研究分析了在类不平衡问题上用于软件缺陷预测的三种广泛使用的重采样技术的性能。重采样技术包括:“采样下随机”,“采样随机”和“综合少数采样技术(SMOTE)”。为了进行实验,将12个可公开获得的清理过的NASA MDP数据集与10个广泛使用的受监督的机器学习分类器一起使用。可以通过各种度量来评估性能,包括:F度量,准确性,MCC和ROC。根据结果​​,大多数分类器在许多数据集中使用“随机抽样”技术表现更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号