Cluster-based under-sampling with random forest for multi-class imbalanced classification

机译：基于群集的下抽样，随机林进行多级不平衡分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multi-class imbalanced classification has emerged as a very challenging research area in machine learning for data mining applications. It occurs when the number of training instances representing majority class instances is much higher than that of minority class instances. Existing machine learning algorithms provide a good accuracy when classifying majority class instances, but ignore/ misclassify the minority class instances. However, the minority class instances hold the most vital information and misclassifying them can lead to serious problems. Several sampling techniques with ensemble learning have been proposed for binary-class imbalanced classification in the last decade. In this paper, we propose a new ensemble learning technique by employing cluster-based under-sampling with random forest algorithm for dealing with multi-class highly imbalanced data classification. The proposed approach cluster the majority class instances and then select the most informative majority class instances in each cluster to form several balanced datasets. After that random forest algorithm is applied on balanced datasets and applied majority voting technique to classify test/ new instances. We tested the performance of our proposed method with existing popular sampling with boosting methods like: AdaBoost, RUSBoost, and SMOTEBoost on 13 benchmark imbalanced datasets. The experimental results show that the proposed cluster-based under-sampling with random forest technique achieved high accuracy for classifying both majority and minority class instances in compare with existing methods.

机译：多级不平衡分类已成为数据挖掘应用的机器学习中的一个非常具有挑战性的研究区域。当表示聚焦类实例的培训实例的数量远高于少数群体类实例时出现。现有机器学习算法在分类多数类实例时提供良好的准确性，但忽略/错误分类少数级别实例。但是，少数级别的实例持有最重要的信息并错误分类，他们会导致严重的问题。在过去十年中已经提出了几种具有集合学习的采样技术，以便在过去十年中进行二进制类不平衡分类。在本文中，我们通过采用基于群集的林算法来处理多级高度不平衡数据分类来提出一种新的集群学习技术。建议的方法群集多数类实例，然后在每个群集中选择最具信息丰富的多数类实例，以形成几个平衡数据集。之后，随机林算法应用于平衡数据集并应用了大多数投票技术来分类测试/新实例。我们测试了我们提出的方法的表现，具有促进方法的现有流行采样，如：Adaboost，Rusboost和SmoteBoost上的13个基准的非平衡数据集。实验结果表明，随机森林技术的基于群体的基于群体的抽样实现了高精度，用于分类多数和少数群体实例，与现有方法相比。

著录项

来源
《International Conference on Software, Knowledge, Information Management and Applications》|2017年|327p|共6页
会议地点
作者
Md. Yasir Arafat; Sabera Hoque; Dewan Md. Farid;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.5-53;
关键词
Classification algorithms; Bagging; Training; Clustering algorithms; Boosting; Data models; Sampling methods;

机译：分类算法;袋装;培训;聚类算法;升压;数据模型;采样方法;

相似文献

外文文献
中文文献
专利

1. Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-sampling (RUS) as a Data Reduction Method [J] . Elhassan AT, Aljourf M, Al-Mohanna F, Global Journal of Technology and Optimization . 2016,第1期

机译：使用Tomek链接（T-Link）结合随机欠采样（RUS）作为数据约简方法对不平衡数据进行分类
2. Cluster-based Under-sampling Approaches For Imbalanced Data Distributions [J] . Show-Jane Yen, Yue-Shi Lee Expert systems with applications . 2009,第3p1期

机译：基于群集的数据采样不均衡的欠采样方法
3. Imbalance accuracy metric for model selection in multi-class imbalance classification problems [J] . Mortaz Ebrahim Knowledge-Based Systems . 2020,第Deca27期

机译：多级不平衡分类问题中模型选择的不平衡精度度量
4. Cluster-based under-sampling with random forest for multi-class imbalanced classification [C] . Md. Yasir Arafat, Sabera Hoque, Dewan Md. Farid International Conference on Software, Knowledge Information Management and Applications . 2017

机译：基于集群的随机森林欠采样，用于多类不平衡分类
5. Multi-Class ROC Random Forest for Imbalanced Classification [D] . Yan, Jiaju. 2017

机译：用于不平衡分类的多类ROC随机森林
6. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [O] . Jinyan Li, Simon Fong, Yunsick Sung, 2016

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标综合少数抽样技术算法
7. CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification [O] . Rayhan, Farshid, Ahmed, Sajid, Mahbub, Asif, 2017

机译：CUsBoost：基于群集的欠采样，具有不平衡的提升分类

Cluster-based under-sampling with random forest for multi-class imbalanced classification

摘要

著录项

相似文献

相关主题

期刊订阅