首页> 外文期刊>International Journal of Hybrid Intelligent Systems >Solving class imbalance problem using bagging, boosting techniques, with and without using noise filtering method
【24h】

Solving class imbalance problem using bagging, boosting techniques, with and without using noise filtering method

机译:使用装袋,增强技术以及是否使用噪声过滤方法来解决班级不平衡问题

获取原文
获取原文并翻译 | 示例
       

摘要

In numerous real-world applications/domains, the class imbalance problem is prevalent/hot topic to focus. In various existing work, for solving class imbalance problem, almost data is labeled as one class called majority class, while fewer data is labeled as the other class, called minority class (more important class to focus). But, none of the work has performed efficiently (in terms of accuracy). This work presents a comparison of the performance of several boosting and bagging techniques from imbalanced datasets. The wide range of application of data mining and machine learning encounters class imbalance problem. An imbalanced datasets consists of samples with skewed distribution and traditional methods show biased towards the negative (majority) samples. Note that popular pre-processing technique for handling class imbalance problems is called over-sampling. It balances the datasets to achieve a high classification rate and also avoids the bias towards majority class samples. Over-sampling technique takes full minority samples in the training data into consideration while performing classification. But, the presence of some noise (in the minority samples and majority samples) may degrade the classification performance. Hence, the work presents a performance comparison using boosting and bagging (i.e., with both techniques) with and without using noise filtering. This work evaluates the performance with the state of-the-art methods based on ensemble learning like AdaBoost, RUSBoost, SMOTEBoost, Bagging, OverBagging, SMOTEBagging on 25 imbalance binary class datasets with various Imbalance Ratios (IR). The experimental results show that our approach works as promising and effective for dealing with imbalanced datasets using metrics like F-Measure and AUC.
机译:在许多实际的应用程序/域中,类不平衡问题是普遍/热门的话题。在现有的各种工作中,为了解决类不平衡问题,几乎将数据标记为一个类,称为多数类,而将较少数据标记为另一类,称为少数类(需要重点关注的重要类)。但是,没有一项工作能有效执行(就准确性而言)。这项工作比较了来自不平衡数据集的几种增强和装袋技术的性能。数据挖掘和机器学习的广泛应用遇到类不平衡问题。不平衡的数据集由分布偏斜的样本组成,传统方法显示偏向负样本(多数)。请注意,用于处理类不平衡问题的流行预处理技术称为过采样。它平衡了数据集以实现较高的分类率,还避免了偏向大多数类别样本的偏见。过采样技术在进行分类时会考虑训练数据中的全部少数采样。但是,某些噪声(少数样本和多数样本)的存在可能会降低分类性能。因此,该工作提出了在使用和不使用噪声过滤的情况下使用增强和装袋(即同时使用两种技术)的性能比较。这项工作基于25种具有各种不平衡率(IR)的不平衡二元类数据集的AdaBoost,RUSBoost,SMOTEBoost,Bagging,OverBagging,SMOTEBagging等集成学习,使用最新方法对性能进行评估。实验结果表明,使用F-Measure和AUC等指标来处理不平衡数据集,我们的方法很有希望且有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号