首页> 外文会议>International Conference of Computer and Information Technology >A Hybrid Under-Sampling Method (HUSBoost) to Classify Imbalanced Data
【24h】

A Hybrid Under-Sampling Method (HUSBoost) to Classify Imbalanced Data

机译:一种混合的下采样方法(husboost)来分类不平衡数据

获取原文

摘要

Imbalanced learning is the issue of learning from data when the class distribution is highly imbalanced. Class imbalance problems are seen increasingly in many domains and pose a challenge to traditional classification techniques. Learning from imbalanced data (two or more classes) creates additional complexities. Studies suggest that ensemble methods can produce more accurate results than regular Imbalance learning techniques (sampling and cost-sensitive learning). To deal with the problem, we propose a new hybrid under sampling based ensemble approach (HUSBoost) to handle imbalanced data which includes three basic steps- data cleaning, data balancing and classification steps. At first, we remove the noisy data using Tomek-Links. After that we create several balanced subsets by applying random under sampling (RUS) method to the majority class instances. These under sampled majority class instances and the minority class instances constitute the subsets of the imbalanced data-set. Having the same number of majority and minority class instances, they become balanced subsets of data. Then in each balanced subset, random forest (RF), AdaBoost with decision tree (CART) and AdaBoost with Support Vector Machine (SVM) are implemented in parallel where we use soft voting approach to get the combined result. From these ensemble classifiers we get the average result from all the balanced subsets. We also use 27 data-sets with different imbalanced ratio in order to verify the effectiveness of our proposed model and compare the experimental results of our model with RUSBoost and EasyEnsemble method.
机译:当类分布高度不平衡时,学习的不平衡学习是从数据学习的问题。类别不平衡问题越来越多的域名,对传统分类技术构成挑战。从不平衡数据(两个或更多类)学习创建额外的复杂性。研究表明,集合方法可以产生比规则的不平衡学习技术更准确的结果(采样和成本敏感的学习)。要处理问题,我们提出了一种新的混合动力,在基于采样的集合方法(Husboost)下处理不平衡数据,包括三个基本步骤 - 数据清洁,数据平衡和分类步骤。首先,我们使用Tomek-Links删除嘈杂的数据。之后,我们通过在对多数类实例下对采样(RUS)方法应用程序来创建多个平衡子集。这些在采样的多数类实例中,少数群体类实例构成了不平衡数据集的子集。拥有相同数量的多数和少数级别的情况,它们成为数据的平衡亚数据。然后在每个平衡子集中,随机森林(RF),带有决策树(推车)和带支持向量机(SVM)的Adaboost的adaboost是并行实现的,我们使用软票方法获得组合结果。从这些集合分类器来看,我们从所有平衡子集获得平均结果。我们还使用具有不同不平衡的比率的27个数据集,以验证我们所提出的模型的有效性,并将我们模型的实验结果与Rusboost和Easy usemble方法进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号