首页> 外文会议>Chinese Control Conference >Oversampling boosting for classification of imbalanced software defect data
【24h】

Oversampling boosting for classification of imbalanced software defect data

机译:过采样增强功能可对不平衡的软件缺陷数据进行分类

获取原文

摘要

In the community of software defect prediction, a common and significant problem is the data imbalance, which is caused by the fact that the non-defect prone modules are much larger than the defect prone modules. This problem makes most of the typical classifiers, such as LR, SVM, Decision tree, Boosting, etc., prefer to the majority class, non-defect prone modules. In most cases, however, we are more interested in the minority class, defect prone modules, as we want to detect more defect prone modules. In order to improve the ability of identifying the minority class, we propose an adaptive weight updating scheme based on AdaBoost. We first, employ SMOTE or any other synthetic samples generation methods to balance the training datasets. Then, every synthetic sample is given a penalty factor adaptively according to sample's density. The penalty factor is introduced into the cost function to adjust samples' weights so that the base classifiers are guided adaptively to learn the reliable synthetic samples instead of noisy samples. Finally, a more reliable classifier is produced, and the accuracy of the minority class is increased. A series of experiments on MDP, a NASA software defect datasets, is performed, and the results demonstrate the effectiveness of our method.
机译:在软件缺陷预测社区中,一个常见且重要的问题是数据不平衡,这是由于非缺陷倾向模块比缺陷倾向模块要大得多这一事实引起的。这个问题使大多数典型分类器(如LR,SVM,决策树,Boosting等)更倾向于多数类,无缺陷的模块。但是,在大多数情况下,由于我们希望检测更多易发故障的模块,因此我们对少数类,易发故障的模块更感兴趣。为了提高识别少数群体的能力,我们提出了一种基于AdaBoost的自适应权重更新方案。我们首先采用SMOTE或任何其他合成样本生成方法来平衡训练数据集。然后,根据样本的密度自适应地给每个合成样本一个惩罚因子。将惩罚因子引入成本函数以调整样本的权重,以便自适应地指导基本分类器学习可靠的合成样本,而不是嘈杂的样本。最终,产生了更可靠的分类器,并提高了少数派分类的准确性。在NASA软件缺陷数据集MDP上进行了一系列实验,结果证明了我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号