首页> 外文期刊>Mathematical Problems in Engineering: Theory, Methods and Applications >Combining Imbalance Learning Strategy and Multiclassifier Estimator for Bug Report Classification
【24h】

Combining Imbalance Learning Strategy and Multiclassifier Estimator for Bug Report Classification

机译:组合不平衡学习策略和MultiClassifier估算器进行错误报告分类

获取原文
           

摘要

Since a large number of bug reports are submitted to the bug repository every day, efficiently assigning bug reports to the correct developer is a considerable challenge. Because of the large differences between the different components of different projects, the current bug classification mainly relies on the components of the bug report to dispatch bug reports to the designated developer or developer community. Unfortunately, the component information of the bug report is filled in by default according to the bug submitter and the result is often incorrect. Thus, an automatic technology that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. In this paper, we propose a method based on the combination of imbalanced learning strategies such as random undersampling (RUS), random oversampling (ROS), synthetic minority oversampling technique (SMOTE), and AdaCost algorithms with multiclass classification methods, OVO and OVA, to solve bug reports component classification problem. We investigate the effectiveness of different combinations, i.e., variants, each of which includes a specific imbalance learning strategy and a specific classification algorithm. We mainly perform an analytical study on five open bug repositories (Eclipse, Mozilla, GCC, OpenOffice, and NetBeans). The results show that different variants have different performance for bug reports component identification and the best performance variants are combined with the imbalanced learning strategy RUS and the OVA method based on the SVM classifier.
机译:由于每天将大量错误报告提交到Bug存储库,因此有效地将错误报告分配给正确的开发人员是一个相当大的挑战。由于不同项目的不同组件之间的巨大差异,目前的错误分类主要依赖于错误报告的组件,以向指定的开发人员或开发人员社区调度错误报告。遗憾的是,根据错误提交者默认情况下,错误报告的组件信息填写,结果通常不正确。因此,可以识别高影响力报告的自动技术可以帮助开发人员早期意识到它们,快速纠正它们,并最大限度地减少它们的损坏。在本文中,我们提出了一种基于非衡度学习策略的组合的方法,例如随机欠采样(RUS),随机过采样(ROS),合成少数群体过采样技术(SMITE),以及具有多款分类方法,OVO和OVA的adacost算法,解决错误报告组件分类问题。我们研究了不同组合,即变体的有效性,其中每个组合包括特定的不平衡学习策略和特定的分类算法。我们主要对五个开放式错误存储库(Eclipse,Mozilla,GCC,OpenOffice和NetBeans)进行分析研究。结果表明,不同的变体对错误报告组件识别具有不同的性能,并且最佳性能变体与基于SVM分类器的不平衡学习策略RUS和OVA方法相结合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号