Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets

Saez Jose A.; Krawczyk Bartosz; Wozniak Michal

首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets

【24h】

Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets

机译：分析多类不平衡数据集中不同类别和示例类型的过采样

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Canonical machine learning algorithms assume that the number of objects in the considered classes are roughly similar. However, in many real-life situations the distribution of examples is skewed since the examples of some of the classes appear much more frequently. This poses a difficulty to learning algorithms, as they will be biased towards the majority classes. In recent years many solutions have been proposed to tackle imbalanced classification, yet they mainly concentrate on binary scenarios. Multi-class imbalanced problems are far more difficult as the relationships between the classes are no longer straightforward. Additionally, one should analyze not only the imbalance ratio but also the characteristics of the objects within each class. In this paper we present a study on oversampling for multi-class imbalanced datasets that focuses on the analysis of the class characteristics. We detect subsets of specific examples in each class and fix the oversampling for each of them independently. Thus, we are able to use information about the class structure and boost the more difficult and important objects. We carry an extensive experimental analysis, which is backed-up with statistical analysis, in order to check when the preprocessing of some types of examples within a class may improve the indiscriminate preprocessing of all the examples in all the classes. The results obtained show that oversampling concrete types of examples may lead to a significant improvement over standard multi-class preprocessing that do not consider the importance of example types. (C) 2016 Elsevier Ltd. All rights reserved.

机译：规范的机器学习算法假定所考虑类中的对象数量大致相似。但是，在许多现实情况下，由于某些类的示例出现的频率更高，因此示例的分布会偏斜。这给学习算法带来了困难，因为它们将偏向大多数类别。近年来，已经提出了许多解决不平衡分类的解决方案，但是它们主要集中在二进制方案上。多类不平衡问题要困难得多，因为各类之间的关系不再简单明了。此外，不仅应该分析不平衡率，还应该分析每一类对象的特性。在本文中，我们针对多类不平衡数据集进行了过采样研究，重点是对类特征的分析。我们在每个类中检测特定示例的子集，并独立修复每个示例的过采样。因此，我们能够使用有关类结构的信息，并增加更困难和重要的对象。我们进行广泛的实验分析，并辅以统计分析，以检查某类中某些类型的示例的预处理何时可以改善所有类中所有示例的不加区别的预处理。获得的结果表明，对示例的具体类型进行过度采样可能会导致对不考虑示例类型重要性的标准多类预处理进行重大改进。（C）2016 Elsevier Ltd.保留所有权利。

著录项

来源
《Pattern Recognition: The Journal of the Pattern Recognition Society》 |2016年第null期|共15页
作者
Saez Jose A.; Krawczyk Bartosz; Wozniak Michal;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Machine learning; Imbalanced classification; Multi-class imbalance; Oversampling; Minority class types;

机译：机器学习;不平衡分类;多类别不平衡;过采样;少数民族类别;

相似文献

外文文献
中文文献
专利

1. Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets [J] . Saez Jose A., Krawczyk Bartosz, Wozniak Michal Pattern Recognition: The Journal of the Pattern Recognition Society . 2016,第Null期

机译：分析多类不平衡数据集中不同类别和示例类型的过采样
2. LoRAS: an oversampling approach for imbalanced datasets [J] . Bej Saptarshi, Davtyan Narek, Wolfien Markus, Machine Learning . 2021,第2期

机译：loras：用于非衡度数据集的过采样方法
3. TGT: A Novel Adversarial Guided Oversampling Technique for Handling Imbalanced Datasets [J] . Ayat Mahmoud, Ayman El-Kilany, Farid Ali, Egyptian Informatics Journal . 2021,第4期

机译：TGT：一种用于处理不平衡数据集的新型逆势导向性过采样技术
4. Effects of Distance between Classes and Training Datasets Size to the Performance of XCS: Case of Imbalance Datasets [C] . Thach H. Nguyen, Sombut Foitong, Somchai Udomthanapong, International MultiConfernece of Engineers and Computer Scientists . 2007

机译：类与训练数据集大小与XCS性能的距离的影响：不平衡数据集的情况
5. Classifier design to improve pattern classification and knowledge discovery for imbalanced datasets. [D] . Wang, Kun. 2009

机译：分类器设计可改进模式分类和不平衡数据集的知识发现。
6. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [O] . Jinyan Li, Simon Fong, Yunsick Sung, 2016

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标综合少数抽样技术算法
7. Constrained Oversampling: An Oversampling Approach to Reduce Noise Generation in Imbalanced Datasets with Class Overlapping [O] . Changhui Liu, Sun Jin, Donghong Wang, 2020

机译：约束过采样：带有类重叠的不平衡数据集中的噪声生成的过采样方法

Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets

摘要

著录项

相似文献

相关主题

期刊订阅