A Hybrid Under-Sampling Method (HUSBoost) to Classify Imbalanced Data

机译：混合欠采样方法（HUSBoost）对不平衡数据进行分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Imbalanced learning is the issue of learning from data when the class distribution is highly imbalanced. Class imbalance problems are seen increasingly in many domains and pose a challenge to traditional classification techniques. Learning from imbalanced data (two or more classes) creates additional complexities. Studies suggest that ensemble methods can produce more accurate results than regular Imbalance learning techniques (sampling and cost-sensitive learning). To deal with the problem, we propose a new hybrid under sampling based ensemble approach (HUSBoost) to handle imbalanced data which includes three basic steps- data cleaning, data balancing and classification steps. At first, we remove the noisy data using Tomek-Links. After that we create several balanced subsets by applying random under sampling (RUS) method to the majority class instances. These under sampled majority class instances and the minority class instances constitute the subsets of the imbalanced data-set. Having the same number of majority and minority class instances, they become balanced subsets of data. Then in each balanced subset, random forest (RF), AdaBoost with decision tree (CART) and AdaBoost with Support Vector Machine (SVM) are implemented in parallel where we use soft voting approach to get the combined result. From these ensemble classifiers we get the average result from all the balanced subsets. We also use 27 data-sets with different imbalanced ratio in order to verify the effectiveness of our proposed model and compare the experimental results of our model with RUSBoost and EasyEnsemble method.

机译：学习失衡是班级分布高度失衡时从数据中学习的问题。在许多领域中，类不平衡问题越来越多，这对传统的分类技术构成了挑战。从不平衡的数据（两个或更多类）中学习会增加额外的复杂性。研究表明，集成方法比常规的不平衡学习技术（抽样和成本敏感型学习）可产生更准确的结果。为了解决该问题，我们提出了一种新的基于采样的集成方法（HUSBoost）混合方法来处理不平衡数据，该方法包括三个基本步骤：数据清理，数据平衡和分类步骤。首先，我们使用Tomek-Links删除噪声数据。之后，我们通过对多数类实例应用随机抽样（RUS）方法来创建几个平衡子集。这些采样不足的多数类实例和少数类实例构成了不平衡数据集的子集。具有相同数量的多数和少数类实例，它们成为数据的平衡子集。然后，在每个平衡子集中，并行实现随机森林（RF），带有决策树的AdaBoost（CART）和带有支持向量机（SVM）的AdaBoost，其中我们使用软投票方法来获得组合结果。从这些集成分类器中，我们得到所有平衡子集的平均结果。为了验证所提出模型的有效性，并使用RUSBoost和EasyEnsemble方法比较了该模型的实验结果，我们还使用了27个具有不同失衡比率的数据集。

著录项

来源
《International Conference of Computer and Information Technology》|2018年|1-7|共7页
会议地点
作者
Mahmudul Hasan Popel; Khan Md. Hasib; Syed Ahsan Habib; Faisal Muhammad Shah;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Boosting; Decision trees; Support vector machines; Noise measurement; Bagging; Cleaning; Radio frequency;

机译：增强;决策树;支持向量机;噪声测量;装袋;清洁;无线电频率;

相似文献

外文文献
中文文献
专利

1. Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets [J] . Aridas Christos K., Karlos Stamatis, Kanas Vasileios G., Quality Control, Transactions . 2020,第期

机译：基于不确定性的基于非衡性贝父分类器的不确定性下滑数据集
2. Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-sampling (RUS) as a Data Reduction Method [J] . Elhassan AT, Aljourf M, Al-Mohanna F, Global Journal of Technology and Optimization . 2016,第1期

机译：使用Tomek链接（T-Link）结合随机欠采样（RUS）作为数据约简方法对不平衡数据进行分类
3. A design of information granule-based under-sampling method in imbalanced data classification [J] . Liu Tianyu, Zhu Xiubin, Pedrycz Witold, Soft computing: A fusion of foundations, methodologies and applications . 2020,第22期

机译：基于信息颗粒的下采样方法设计在不平衡数据分类中
4. A Hybrid Under-Sampling Method (HUSBoost) to Classify Imbalanced Data [C] . Mahmudul Hasan Popel, Khan Md. Hasib, Syed Ahsan Habib, International Conference of Computer and Information Technology . 2018

机译：一种混合的下采样方法（husboost）来分类不平衡数据
5. Diversified ensemble classifiers for highly imbalanced data learning and its application in bioinformatics. [D] . Ding, Zejin. 2011

机译：用于高度不平衡数据学习的多元化集成分类器及其在生物信息学中的应用。
6. Over- and Under-sampling Approach for Extremely Imbalanced and Small Minority Data Problem in Health Record Analysis [O] . Koichi Fujiwara, Yukun Huang, Kentaro Hori, 2020

机译：健康记录分析中极度不平衡和少数族裔数据问题的过采样和欠采样方法
7. Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets [O] . Christos K. Aridas, Stamatis Karlos, Vasileios G. Kanas, 2020

机译：基于不确定性的基于非衡性贝父分类器的不确定性下滑数据集

A Hybrid Under-Sampling Method (HUSBoost) to Classify Imbalanced Data

摘要

著录项

相似文献

相关主题

期刊订阅