SCUT: Multi-Class Imbalanced Data Classification using SMOTE and Cluster-based Undersampling

机译：SCUT：使用Smote和基于群集的欠采样的多级不平衡数据分类

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Class imbalance is a crucial problem in machine learning and occurs in many domains. Specifically, the two-class problem has received interest from researchers in recent years, leading to solutions for oil spill detection, tumour discovery and fraudulent credit card detection, amongst others. However, handling class imbalance in datasets that contains multiple classes, with varying degree of imbalance, has received limited attention. In such a multi-class imbalanced dataset, the classification model tends to favour the majority classes and incorrectly classify instances from the minority classes as belonging to the majority classes, leading to poor predictive accuracies. Further, there is a need to handle both the imbalances between classes as well as address the selection of examples within a class (i.e. the so-called within class imbalance). In this paper, we propose the SCUT hybrid sampling method, which is used to balance the number of training examples in such a multi-class setting. Our SCUT approach oversamples minority class examples through the generation of synthetic examples and employs cluster analysis in order to undersample majority classes. In addition, it handles both within-class and between-class imbalance. Our experimental results against a number of multi-class problems show that, when the SCUT method is used for pre-processing the data before classification, we obtain highly accurate models that compare favourably to the state-of-the-art.

机译：类别不平衡是机器学习中的一个重要问题，并发生在许多域中。具体来说，两班问题近年来研究了研究人员的兴趣，导致漏油检测，肿瘤发现和欺诈性信用卡检测的解决方案。但是，处理包含多个类的数据集中的课程不平衡，具有不同程度的不平衡，已受到有限的关注。在这种多级不平衡数据集中，分类模型倾向于支持多数类别，并错误地将少数课程的实例分类为属于多数类，导致预测准确性差。此外，需要处理类之间的不平衡以及在类内的示例中的选择（即，在类别不平衡中所谓的所谓）。在本文中，我们提出了SCUT混合采样方法，用于平衡这种多级设置中的训练示例的数量。我们的SCUT方法通过生成综合实例来防范少数群体类别的例子，并采用集群分析以缺乏多数班级。此外，它处理课堂内和课程之间的不平衡。我们针对许多多级问题的实验结果表明，当SCUT方法用于预处理数据之前，我们获得高度准确的模型，可以对最先进的方式进行比较。

著录项

来源
《International Conference on Knowledge Discovery and Information Retrieval》|2015年|1(CD-ROM)|共9页
会议地点
作者
Astha Agrawal; Herna L. Viktor; Eric Paquet;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 G354-53;
关键词
Multi-Class Imbalance; Undersampling; Oversampling; Classification; Clustering;

机译：多级失衡;欠采样;过采样;分类;聚类;

相似文献

外文文献
中文文献
专利

1. SMOTE-RSB_*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory [J] . Enislay Ramentol, Yaile Caballero, Rafael Bello, Knowledge and information systems . 2012,第2期

机译：SMOTE-RSB_ *：使用SMOTE和粗糙集理论的基于过采样和欠采样的混合预处理方法，用于高不平衡数据集
2. SMOTE-RSB *: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory [J] . Enislay Ramentol, Yailé Caballero, Rafael Bello, Knowledge and Information Systems . 2012,第2期

机译：SMOTE-RSB * ：一种基于过采样和欠采样的混合预处理方法，使用SMOTE和粗糙集理论处理高不平衡数据集
3. Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling [J] . Julián Luengo, Alberto Fernández, Salvador García, Soft Computing - A Fusion of Foundations, Methodologies and Applications . 2011,第10期

机译：解决不平衡数据集的数据复杂性：基于SMOTE的过采样和进化欠采样的分析
4. SCUT: Multi-class imbalanced data classification using SMOTE and cluster-based undersampling [C] . Astha Agrawal, Herna L. Viktor, Eric Paquet 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management . 2015

机译：SCUT：使用SMOTE和基于群集的欠采样进行的多类不平衡数据分类
5. SMOTE Variants for Imbalanced Binary Classification: Heart Disease Prediction [D] . Zheng, Xiaoru. 2020

机译：粉体变异性模拟分类：心脏病预测
6. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [O] . Jinyan Li, Simon Fong, Yunsick Sung, 2016

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标综合少数抽样技术算法
7. SCUT: multi-class imbalanced data classification using SMOTE and cluster-based undersampling [O] . Agrawal, Astha, Viktor, Herna L., Paquet, Eric 2015

机译：SCUT：使用SMOTE和基于群集的欠采样进行的多类不平衡数据分类

SCUT: Multi-Class Imbalanced Data Classification using SMOTE and Cluster-based Undersampling

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅