A Hybrid Under-Sampling Method (HUSBoost) to Classify Imbalanced Data

机译：一种混合的下采样方法（husboost）来分类不平衡数据

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Imbalanced learning is the issue of learning from data when the class distribution is highly imbalanced. Class imbalance problems are seen increasingly in many domains and pose a challenge to traditional classification techniques. Learning from imbalanced data (two or more classes) creates additional complexities. Studies suggest that ensemble methods can produce more accurate results than regular Imbalance learning techniques (sampling and cost-sensitive learning). To deal with the problem, we propose a new hybrid under sampling based ensemble approach (HUSBoost) to handle imbalanced data which includes three basic steps- data cleaning, data balancing and classification steps. At first, we remove the noisy data using Tomek-Links. After that we create several balanced subsets by applying random under sampling (RUS) method to the majority class instances. These under sampled majority class instances and the minority class instances constitute the subsets of the imbalanced data-set. Having the same number of majority and minority class instances, they become balanced subsets of data. Then in each balanced subset, random forest (RF), AdaBoost with decision tree (CART) and AdaBoost with Support Vector Machine (SVM) are implemented in parallel where we use soft voting approach to get the combined result. From these ensemble classifiers we get the average result from all the balanced subsets. We also use 27 data-sets with different imbalanced ratio in order to verify the effectiveness of our proposed model and compare the experimental results of our model with RUSBoost and EasyEnsemble method.

机译：当类分布高度不平衡时，学习的不平衡学习是从数据学习的问题。类别不平衡问题越来越多的域名，对传统分类技术构成挑战。从不平衡数据（两个或更多类）学习创建额外的复杂性。研究表明，集合方法可以产生比规则的不平衡学习技术更准确的结果（采样和成本敏感的学习）。要处理问题，我们提出了一种新的混合动力，在基于采样的集合方法（Husboost）下处理不平衡数据，包括三个基本步骤 - 数据清洁，数据平衡和分类步骤。首先，我们使用Tomek-Links删除嘈杂的数据。之后，我们通过在对多数类实例下对采样（RUS）方法应用程序来创建多个平衡子集。这些在采样的多数类实例中，少数群体类实例构成了不平衡数据集的子集。拥有相同数量的多数和少数级别的情况，它们成为数据的平衡亚数据。然后在每个平衡子集中，随机森林（RF），带有决策树（推车）和带支持向量机（SVM）的Adaboost的adaboost是并行实现的，我们使用软票方法获得组合结果。从这些集合分类器来看，我们从所有平衡子集获得平均结果。我们还使用具有不同不平衡的比率的27个数据集，以验证我们所提出的模型的有效性，并将我们模型的实验结果与Rusboost和Easy usemble方法进行比较。

著录项

来源
《International Conference of Computer and Information Technology》|2018年|422 p. :|共7页
会议地点
作者
Mahmudul Hasan Popel; Khan Md. Hasib; Syed Ahsan Habib; Faisal Muhammad Shah;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Boosting; Decision trees; Support vector machines; Noise measurement; Bagging; Cleaning; Radio frequency;

机译：提升;决策树;支持向量机;噪声测量;袋装;清洁;射频;

相似文献

外文文献
中文文献
专利

1. Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets [J] . Aridas Christos K., Karlos Stamatis, Kanas Vasileios G., Quality Control, Transactions . 2020,第期

机译：基于不确定性的基于非衡性贝父分类器的不确定性下滑数据集
2. Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-sampling (RUS) as a Data Reduction Method [J] . Elhassan AT, Aljourf M, Al-Mohanna F, Global Journal of Technology and Optimization . 2016,第1期

机译：使用Tomek链接（T-Link）结合随机欠采样（RUS）作为数据约简方法对不平衡数据进行分类
3. A design of information granule-based under-sampling method in imbalanced data classification [J] . Liu Tianyu, Zhu Xiubin, Pedrycz Witold, Soft computing: A fusion of foundations, methodologies and applications . 2020,第22期

机译：基于信息颗粒的下采样方法设计在不平衡数据分类中
4. A Hybrid Under-Sampling Method (HUSBoost) to Classify Imbalanced Data [C] . Mahmudul Hasan Popel, Khan Md. Hasib, Syed Ahsan Habib, International Conference of Computer and Information Technology . 2018

机译：混合欠采样方法（HUSBoost）对不平衡数据进行分类
5. Diversified ensemble classifiers for highly imbalanced data learning and its application in bioinformatics. [D] . Ding, Zejin. 2011

机译：用于高度不平衡数据学习的多元化集成分类器及其在生物信息学中的应用。
6. Over- and Under-sampling Approach for Extremely Imbalanced and Small Minority Data Problem in Health Record Analysis [O] . Koichi Fujiwara, Yukun Huang, Kentaro Hori, 2020

机译：健康记录分析中极度不平衡和少数族裔数据问题的过采样和欠采样方法
7. Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets [O] . Christos K. Aridas, Stamatis Karlos, Vasileios G. Kanas, 2020

机译：基于不确定性的基于非衡性贝父分类器的不确定性下滑数据集

A Hybrid Under-Sampling Method (HUSBoost) to Classify Imbalanced Data

摘要

著录项

相似文献

相关主题

期刊订阅