首页> 外文会议>International Conference on Cloud Computing, Data Science Engineering >Handling Imbalanced Data using Ensemble Learning in Software Defect Prediction
【24h】

Handling Imbalanced Data using Ensemble Learning in Software Defect Prediction

机译:在软件缺陷预测中使用集成学习处理不平衡数据

获取原文

摘要

With the ever growing software industry, software defect prediction is one of the key ingredients in recipe of producing good quality software. Defects uncovered well in time helps in saving resources in terms of time, effort and money. However imbalanced nature of software data may hamper the resultant performance of models leading to incorrect interpretations of results. This problem has dragged attention of researchers and many solutions are proposed to overcome the effect of this problem. This paper aims to provide empirical comparison of software defect prediction models developed by using various boosting based ensemble methods on three open source JAVA projects. Four ensemble methods incorporate resampling techniques within them. Performances of models obtained are evaluated using stable metrics like Balance, G-Mean and AUC. Results show that use of resampling techniques before classifying using ensemble method has significantly improved model prediction as compared to classic boosting models. RUSBoost is the undisputed winner amongst all followed by MSMOTEBoost and SMOTEBoost.
机译:随着软件行业的不断发展,软件缺陷预测已成为生产高质量软件的关键要素之一。及时发现缺陷有助于节省时间,精力和金钱。但是,软件数据的不平衡特性可能会妨碍模型的结果性能,从而导致对结果的错误解释。这个问题引起了研究者的注意,并且提出了许多解决方案来克服这个问题的影响。本文旨在提供在三个开源JAVA项目上使用各种基于Boosting的集成方法开发的软件缺陷预测模型的经验比较。四种集成方法在其中结合了重采样技术。使用诸如Balance,G-Mean和AUC的稳定指标评估获得的模型的性能。结果表明,与传统的增强模型相比,在使用集成方法进行分类之前使用重采样技术可以显着改善模型的预测。 RUSBoost是无可争议的赢家,紧随其后的是MSMOTEBoost和SMOTEBoost。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号