An imbalanced data classification method based on automatic clustering under-sampling

机译：一种基于自动聚类下的抽样的不平衡数据分类方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Classification of imbalanced datasets has become one of the most challenging problems in big data mining. Because the number of positive samples is far less than the negative samples, low accuracy and poor generalization performance and some other defects always go with learning process of traditional algorithms. Ensemble construction algorithm is an important method to handle this problem. Especially, the ensemble construction algorithm based on random under-sampling or clustering can effectively improve the performance of classification. However, the former causes information loss easily and the latter increases complexity. In this paper, we propose ACUS, an improved ensemble algorithm based on automatic clustering and under-sampling. ACUS conducts clustering first according to the weight of samples, and then it constructs balanced-distributed dataset which consists of a certain percentage of the majority class and all of the minority class from each cluster. With Adaboost algorithm construction, these datasets are used to get an ensemble classifier. Experimental results demonstrate the advantages of our proposed algorithm in terms of accuracy, simplicity and high stability.

机译：不平衡数据集的分类已成为大数据挖掘中最具挑战性问题之一。因为正样本的数量远小于负样本，所以低精度和普遍性差的性能以及一些其他缺陷总是通过传统算法的学习过程。合奏施工算法是处理此问题的重要方法。特别是，基于随机欠抽样或聚类的集合施工算法可以有效地提高分类的性能。然而，前者容易引起信息损失，后者增加了复杂性。在本文中，我们提出了一种基于自动聚类和脱模的改进的集合算法。 ACU首先按照样本的重量进行聚类，然后它构成平衡分布的数据集，该数据集由每个群集的多数类和所有少数群体组成的平衡分布式数据集。使用AdaBoost算法构造，这些数据集用于获取集合分类器。实验结果表明了我们所提出的算法在准确性，简单性和高稳定性方面的优点。

著录项

来源
《IEEE International Performance Computing and Communications Conference》|2016年|477p|共8页
会议地点
作者
Xiaoheng Deng; Weijian Zhong; Ju Ren; Detian Zeng; Honggang Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN915-53;
关键词
Clustering algorithms; Classification algorithms; Training; Algorithm design and analysis; Boosting; Partitioning algorithms; Time complexity;

机译：聚类算法;分类算法;训练;算法设计和分析;升压;分区算法;时间复杂性;

相似文献

外文文献
中文文献
专利

1. A design of information granule-based under-sampling method in imbalanced data classification [J] . Liu Tianyu, Zhu Xiubin, Pedrycz Witold, Soft computing: A fusion of foundations, methodologies and applications . 2020,第22期

机译：基于信息颗粒的下采样方法设计在不平衡数据分类中
2. Evolutionary under-sampling based bagging ensemble method for imbalanced data classification [J] . Bo SUN, Haiyan CHEN, Jiandong WANG, Frontiers of computer science in China . 2018,第2期

机译：基于演化欠采样的装袋集成方法用于不平衡数据分类
3. Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-sampling (RUS) as a Data Reduction Method [J] . Elhassan AT, Aljourf M, Al-Mohanna F, Global Journal of Technology and Optimization . 2016,第1期

机译：使用Tomek链接（T-Link）结合随机欠采样（RUS）作为数据约简方法对不平衡数据进行分类
4. An imbalanced data classification method based on automatic clustering under-sampling [C] . Xiaoheng Deng, Weijian Zhong, Ju Ren, IEEE International Performance Computing and Communications Conference . 2016

机译：基于自动聚类欠采样的不平衡数据分类方法
5. Shape Theoretic and Machine Learning Based Methods for Automatic Clustering and Classification of Cardiomyocytes Based on Action Potential Morphology [D] . Gorospe, Giann 2018

机译：基于形状理论和机器学习的基于动作电位形态学的心肌细胞自动聚类和分类方法
6. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [O] . Jinyan Li, Simon Fong, Yunsick Sung, 2016

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标综合少数抽样技术算法
7. CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification [O] . Rayhan, Farshid, Ahmed, Sajid, Mahbub, Asif, 2017

机译：CUsBoost：基于群集的欠采样，具有不平衡的提升分类

An imbalanced data classification method based on automatic clustering under-sampling

摘要

著录项

相似文献

相关主题

期刊订阅