SCUT: Multi-class imbalanced data classification using SMOTE and cluster-based undersampling

机译：SCUT：使用SMOTE和基于群集的欠采样进行的多类不平衡数据分类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Class imbalance is a crucial problem in machine learning and occurs in many domains. Specifically, the two-class problem has received interest from researchers in recent years, leading to solutions for oil spill detection, tumour discovery and fraudulent credit card detection, amongst others. However, handling class imbalance in datasets that contains multiple classes, with varying degree of imbalance, has received limited attention. In such a multi-class imbalanced dataset, the classification model tends to favour the majority classes and incorrectly classify instances from the minority classes as belonging to the majority classes, leading to poor predictive accuracies. Further, there is a need to handle both the imbalances between classes as well as address the selection of examples within a class (i.e. the so-called within class imbalance). In this paper, we propose the SCUT hybrid sampling method, which is used to balance the number of training examples in such a multi-class setting. Our SCUT approach oversamples minority class examples through the generation of synthetic examples and employs cluster analysis in order to undersample majority classes. In addition, it handles both within-class and between-class imbalance. Our experimental results against a number of multi-class problems show that, when the SCUT method is used for pre-processing the data before classification, we obtain highly accurate models that compare favourably to the state-of-the-art.

机译：类不平衡是机器学习中的关键问题，它发生在许多领域。具体而言，近年来，两类问题引起了研究人员的关注，从而导致了漏油检测，肿瘤发现和欺诈性信用卡检测等解决方案。但是，在包含多个类别且不平衡程度不同的数据集中处理类别不平衡的问题受到了有限的关注。在这样的多类不平衡数据集中，分类模型倾向于偏爱多数类，并错误地将少数类的实例分类为属于多数类，从而导致较差的预测准确性。此外，需要处理类别之间的不平衡以及处理类别内的示例的选择（即，所谓的类别内不平衡）。在本文中，我们提出了SCUT混合采样方法，该方法用于在这种多类设置中平衡训练示例的数量。我们的SCUT方法通过生成综合示例对少数派类别的示例进行过度采样，并采用聚类分析以对多数派类别进行欠采样。另外，它可以处理类内部和类之间的不平衡。我们针对许多多类问题的实验结果表明，当使用SCUT方法对分类之前的数据进行预处理时，我们获得的高精度模型可以与最新技术进行比较。

著录项

来源
《2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management》|2015年|226-234|共9页
会议地点 Lisbon(PT)
作者
Astha Agrawal; Herna L. Viktor; Eric Paquet;
展开▼
作者单位

School of Electrical Engineering and Computer Science, University of Ottawa, Ontario, Canada;

School of Electrical Engineering and Computer Science, University of Ottawa, Ontario, Canada;

School of Electrical Engineering and Computer Science, University of Ottawa, Ontario, Canada;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Clustering algorithms; Training; Sampling methods; Proteins; Data models; Cancer; Probability distribution;

机译：聚类算法;训练;采样方法;蛋白质;数据模型;癌症;概率分布;

相似文献

外文文献
中文文献
专利

1. SMOTE-RSB_*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory [J] . Enislay Ramentol, Yaile Caballero, Rafael Bello, Knowledge and information systems . 2012,第2期

机译：SMOTE-RSB_ *：使用SMOTE和粗糙集理论的基于过采样和欠采样的混合预处理方法，用于高不平衡数据集
2. SMOTE-RSB *: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory [J] . Enislay Ramentol, Yailé Caballero, Rafael Bello, Knowledge and Information Systems . 2012,第2期

机译：SMOTE-RSB * ：一种基于过采样和欠采样的混合预处理方法，使用SMOTE和粗糙集理论处理高不平衡数据集
3. Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling [J] . Julián Luengo, Alberto Fernández, Salvador García, Soft Computing - A Fusion of Foundations, Methodologies and Applications . 2011,第10期

机译：解决不平衡数据集的数据复杂性：基于SMOTE的过采样和进化欠采样的分析
4. SCUT: Multi-Class Imbalanced Data Classification using SMOTE and Cluster-based Undersampling [C] . Astha Agrawal, Herna L. Viktor, Eric Paquet International Conference on Knowledge Discovery and Information Retrieval . 2015

机译：SCUT：使用Smote和基于群集的欠采样的多级不平衡数据分类
5. SMOTE Variants for Imbalanced Binary Classification: Heart Disease Prediction [D] . Zheng, Xiaoru. 2020

机译：粉体变异性模拟分类：心脏病预测
6. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [O] . Jinyan Li, Simon Fong, Yunsick Sung, 2016

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标综合少数抽样技术算法
7. SCUT: multi-class imbalanced data classification using SMOTE and cluster-based undersampling [O] . Agrawal, Astha, Viktor, Herna L., Paquet, Eric 2015

机译：SCUT：使用SMOTE和基于群集的欠采样进行的多类不平衡数据分类

SCUT: Multi-class imbalanced data classification using SMOTE and cluster-based undersampling

摘要

著录项

相似文献

相关主题

期刊订阅