首页> 外文会议>IEEE International Performance Computing and Communications Conference >An imbalanced data classification method based on automatic clustering under-sampling
【24h】

An imbalanced data classification method based on automatic clustering under-sampling

机译:一种基于自动聚类下的抽样的不平衡数据分类方法

获取原文

摘要

Classification of imbalanced datasets has become one of the most challenging problems in big data mining. Because the number of positive samples is far less than the negative samples, low accuracy and poor generalization performance and some other defects always go with learning process of traditional algorithms. Ensemble construction algorithm is an important method to handle this problem. Especially, the ensemble construction algorithm based on random under-sampling or clustering can effectively improve the performance of classification. However, the former causes information loss easily and the latter increases complexity. In this paper, we propose ACUS, an improved ensemble algorithm based on automatic clustering and under-sampling. ACUS conducts clustering first according to the weight of samples, and then it constructs balanced-distributed dataset which consists of a certain percentage of the majority class and all of the minority class from each cluster. With Adaboost algorithm construction, these datasets are used to get an ensemble classifier. Experimental results demonstrate the advantages of our proposed algorithm in terms of accuracy, simplicity and high stability.
机译:不平衡数据集的分类已成为大数据挖掘中最具挑战性问题之一。因为正样本的数量远小于负样本,所以低精度和普遍性差的性能以及一些其他缺陷总是通过传统算法的学习过程。合奏施工算法是处理此问题的重要方法。特别是,基于随机欠抽样或聚类的集合施工算法可以有效地提高分类的性能。然而,前者容易引起信息损失,后者增加了复杂性。在本文中,我们提出了一种基于自动聚类和脱模的改进的集合算法。 ACU首先按照样本的重量进行聚类,然后它构成平衡分布的数据集,该数据集由每个群集的多数类和所有少数群体组成的平衡分布式数据集。使用AdaBoost算法构造,这些数据集用于获取集合分类器。实验结果表明了我们所提出的算法在准确性,简单性和高稳定性方面的优点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号