首页> 外文期刊>Applied Sciences >Class Imbalance Ensemble Learning Based on the Margin Theory
【24h】

Class Imbalance Ensemble Learning Based on the Margin Theory

机译:基于余量理论的班级不平衡整体学习

获取原文
           

摘要

The proportion of instances belonging to each class in a data-set plays an important role in machine learning. However, the real world data often suffer from class imbalance. Dealing with multi-class tasks with different misclassification costs of classes is harder than dealing with two-class ones. Undersampling and oversampling are two of the most popular data preprocessing techniques dealing with imbalanced data-sets. Ensemble classifiers have been shown to be more effective than data sampling techniques to enhance the classification performance of imbalanced data. Moreover, the combination of ensemble learning with sampling methods to tackle the class imbalance problem has led to several proposals in the literature, with positive results. The ensemble margin is a fundamental concept in ensemble learning. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples. In this paper, we propose a novel ensemble margin based algorithm, which handles imbalanced classification by employing more low margin examples which are more informative than high margin samples. This algorithm combines ensemble learning with undersampling, but instead of balancing classes randomly such as UnderBagging , our method pays attention to constructing higher quality balanced sets for each base classifier. In order to demonstrate the effectiveness of the proposed method in handling class imbalanced data, UnderBagging and SMOTEBagging are used in a comparative analysis. In addition, we also compare the performances of different ensemble margin definitions, including both supervised and unsupervised margins, in class imbalance learning.
机译:数据集中属于每个类的实例的比例在机器学习中起着重要的作用。但是,现实世界中的数据经常遭受类不平衡的困扰。处理具有不同分类错误成本的多类任务比处理两类任务困难。欠采样和过采样是处理不平衡数据集的两种最流行的数据预处理技术。集合分类器已被证明比数据采样技术更有效,以增强不平衡数据的分类性能。此外,将集成学习与采样方法相结合来解决班级不平衡问题,导致文献中提出了一些建议,并取得了积极的成果。集成余量是集成学习中的基本概念。多项研究表明,集成分类器的泛化性能与其在训练示例上的边距分布有关。在本文中,我们提出了一种新的基于集成余量的算法,该算法通过使用更多的低余量示例比高余量样本提供更多信息来处理不平衡分类。该算法将整体学习与欠采样相结合,但是我们的方法不是为诸如UnderBagging之类的随机平衡类,而是为每个基本分类器构造更高质量的平衡集。为了证明该方法在处理类不平衡数据方面的有效性,在比较分析中使用了UnderBagging和SMOTEBagging。此外,我们还在班级不平衡学习中比较了不同的合奏余量定义的性能,包括监督和非监督余量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号