Cluster-based under-sampling with random forest for multi-class imbalanced classification

机译：基于集群的随机森林欠采样，用于多类不平衡分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multi-class imbalanced classification has emerged as a very challenging research area in machine learning for data mining applications. It occurs when the number of training instances representing majority class instances is much higher than that of minority class instances. Existing machine learning algorithms provide a good accuracy when classifying majority class instances, but ignore/ misclassify the minority class instances. However, the minority class instances hold the most vital information and misclassifying them can lead to serious problems. Several sampling techniques with ensemble learning have been proposed for binary-class imbalanced classification in the last decade. In this paper, we propose a new ensemble learning technique by employing cluster-based under-sampling with random forest algorithm for dealing with multi-class highly imbalanced data classification. The proposed approach cluster the majority class instances and then select the most informative majority class instances in each cluster to form several balanced datasets. After that random forest algorithm is applied on balanced datasets and applied majority voting technique to classify test/ new instances. We tested the performance of our proposed method with existing popular sampling with boosting methods like: AdaBoost, RUSBoost, and SMOTEBoost on 13 benchmark imbalanced datasets. The experimental results show that the proposed cluster-based under-sampling with random forest technique achieved high accuracy for classifying both majority and minority class instances in compare with existing methods.

机译：在数据挖掘应用的机器学习中，多类不平衡分类已成为一个非常具有挑战性的研究领域。当代表多数班级实例的训练实例的数量比少数班级实例的培训实例的数量高得多时，就会发生这种情况。现有的机器学习算法在对多数类实例进行分类时提供了很好的准确性，但忽略/错误地对了少数类实例进行了分类。但是，少数群体实例拥有最重要的信息，对它们进行错误分类会导致严重的问题。在过去的十年中，已经提出了几种具有整体学习的采样技术来进行二元类不平衡分类。在本文中，我们提出了一种新的集成学习技术，它采用基于簇的欠采样和随机森林算法来处理多类高度不平衡的数据分类。所提出的方法将多数类实例聚类，然后在每个聚类中选择信息量最大的多数类实例，以形成几个平衡的数据集。之后，将随机森林算法应用于平衡数据集，并应用多数投票技术对测试/新实例进行分类。我们在13种基准不平衡数据集上使用诸如AdaBoost，RUSBoost和SMOTEBoost之类的增强方法，通过现有的流行采样测试了我们提出的方法的性能。实验结果表明，与现有方法相比，所提出的基于随机森林技术的基于聚类的欠采样实现了对多数和少数类别实例的分类的高精度。

著录项

来源
《International Conference on Software, Knowledge Information Management and Applications》|2017年|1-6|共6页
会议地点
作者
Md. Yasir Arafat; Sabera Hoque; Dewan Md. Farid;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Classification algorithms; Bagging; Training; Clustering algorithms; Boosting; Data models; Sampling methods;

机译：分类算法;装袋;训练;聚类算法;提升;数据模型;抽样方法;

相似文献

外文文献
中文文献
专利

1. Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-sampling (RUS) as a Data Reduction Method [J] . Elhassan AT, Aljourf M, Al-Mohanna F, Global Journal of Technology and Optimization . 2016,第1期

机译：使用Tomek链接（T-Link）结合随机欠采样（RUS）作为数据约简方法对不平衡数据进行分类
2. Cluster-based Under-sampling Approaches For Imbalanced Data Distributions [J] . Show-Jane Yen, Yue-Shi Lee Expert systems with applications . 2009,第3p1期

机译：基于群集的数据采样不均衡的欠采样方法
3. Imbalance accuracy metric for model selection in multi-class imbalance classification problems [J] . Mortaz Ebrahim Knowledge-Based Systems . 2020,第Deca27期

机译：多级不平衡分类问题中模型选择的不平衡精度度量
4. Cluster-based under-sampling with random forest for multi-class imbalanced classification [C] . Md. Yasir Arafat, Sabera Hoque, Dewan Md. Farid International Conference on Software, Knowledge, Information Management and Applications . 2017

机译：基于群集的下抽样，随机林进行多级不平衡分类
5. Multi-Class ROC Random Forest for Imbalanced Classification [D] . Yan, Jiaju. 2017

机译：用于不平衡分类的多类ROC随机森林
6. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [O] . Jinyan Li, Simon Fong, Yunsick Sung, 2016

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标综合少数抽样技术算法
7. CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification [O] . Rayhan, Farshid, Ahmed, Sajid, Mahbub, Asif, 2017

机译：CUsBoost：基于群集的欠采样，具有不平衡的提升分类

Cluster-based under-sampling with random forest for multi-class imbalanced classification

摘要

著录项

相似文献

相关主题

期刊订阅