首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >A novel ensemble method for classifying imbalanced data
【24h】

A novel ensemble method for classifying imbalanced data

机译:一种新的不平衡数据分类方法

获取原文
获取原文并翻译 | 示例
           

摘要

The class imbalance problems have been reported to severely hinder classification performance of many standard learning algorithms, and have attracted a great deal of attention from researchers of different fields. Therefore, a number of methods, such as sampling methods, cost-sensitive learning methods, and bagging and boosting based ensemble methods, have been proposed to solve these problems. However, these conventional class imbalance handling methods might suffer from the loss of potentially useful information, unexpected mistakes or increasing the likelihood of overfitting because they may alter the original data distribution. Thus we propose a novel ensemble method, which firstly converts an imbalanced data set into multiple balanced ones and then builds a number of classifiers on these multiple data with a specific classification algorithm. Finally, the classification results of these classifiers for new data are combined by a specific ensemble rule. In the empirical study, different class imbalance data handling methods including three conventional sampling methods, one cost-sensitive learning method, six Bagging and Boosting based ensemble methods, our previous method EM1vs1 and two fuzzy-rule based classification methods were compared with our method. The experimental results on 46 imbalanced data sets show that our proposed method is usually superior to the conventional imbalance data handling methods when solving the highly imbalanced problems. (C) 2014 Elsevier Ltd. All rights reserved.
机译:据报道,类不平衡问题严重阻碍了许多标准学习算法的分类性能,并引起了不同领域研究人员的极大关注。因此,已经提出了许多方法来解决这些问题,例如抽样方法,成本敏感的学习方法以及基于装袋和增强的集成方法。但是,这些常规的类别不平衡处理方法可能会遭受潜在有用信息的丢失,意外错误或过拟合的可能性增加,因为它们可能会更改原始数据分布。因此,我们提出了一种新颖的集成方法,该方法首先将不平衡数据集转换为多个平衡数据集,然后使用特定的分类算法在这些多个数据上建立多个分类器。最后,这些分类器对新数据的分类结果通过特定的集成规则进行组合。在实证研究中,将不同类别的不平衡数据处理方法(包括三种常规采样方法,一种成本敏感型学习方法,六种基于Bagging和Boosting的集成方法,我们先前的方法EM1vs1和两种基于模糊规则的分类方法)与我们的方法进行了比较。在46个不平衡数据集上的实验结果表明,在解决高度不平衡问题时,我们提出的方法通常优于传统的不平衡数据处理方法。 (C)2014 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号