首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets
【24h】

Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets

机译:分析多类不平衡数据集中不同类别和示例类型的过采样

获取原文
获取原文并翻译 | 示例
           

摘要

Canonical machine learning algorithms assume that the number of objects in the considered classes are roughly similar. However, in many real-life situations the distribution of examples is skewed since the examples of some of the classes appear much more frequently. This poses a difficulty to learning algorithms, as they will be biased towards the majority classes. In recent years many solutions have been proposed to tackle imbalanced classification, yet they mainly concentrate on binary scenarios. Multi-class imbalanced problems are far more difficult as the relationships between the classes are no longer straightforward. Additionally, one should analyze not only the imbalance ratio but also the characteristics of the objects within each class. In this paper we present a study on oversampling for multi-class imbalanced datasets that focuses on the analysis of the class characteristics. We detect subsets of specific examples in each class and fix the oversampling for each of them independently. Thus, we are able to use information about the class structure and boost the more difficult and important objects. We carry an extensive experimental analysis, which is backed-up with statistical analysis, in order to check when the preprocessing of some types of examples within a class may improve the indiscriminate preprocessing of all the examples in all the classes. The results obtained show that oversampling concrete types of examples may lead to a significant improvement over standard multi-class preprocessing that do not consider the importance of example types. (C) 2016 Elsevier Ltd. All rights reserved.
机译:规范的机器学习算法假定所考虑类中的对象数量大致相似。但是,在许多现实情况下,由于某些类的示例出现的频率更高,因此示例的分布会偏斜。这给学习算法带来了困难,因为它们将偏向大多数类别。近年来,已经提出了许多解决不平衡分类的解决方案,但是它们主要集中在二进制方案上。多类不平衡问题要困难得多,因为各类之间的关系不再简单明了。此外,不仅应该分析不平衡率,还应该分析每一类对象的特性。在本文中,我们针对多类不平衡数据集进行了过采样研究,重点是对类特征的分析。我们在每个类中检测特定示例的子集,并独立修复每个示例的过采样。因此,我们能够使用有关类结构的信息,并增加更困难和重要的对象。我们进行广泛的实验分析,并辅以统计分析,以检查某类中某些类型的示例的预处理何时可以改善所有类中所有示例的不加区别的预处理。获得的结果表明,对示例的具体类型进行过度采样可能会导致对不考虑示例类型重要性的标准多类预处理进行重大改进。 (C)2016 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号