一种基于聚类提升的不平衡数据分类算法

胡小生; 张润晶; 钟勇

首页> 中文期刊> 《集成技术》 >一种基于聚类提升的不平衡数据分类算法

一种基于聚类提升的不平衡数据分类算法

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Imbalanced data exist widely in the real world and their classiifcation is a hot topic in the ifeld of machine learning. A clustering-based enhanced AdaBoost algorithm was proposed to improve the poor classiifcation performance produced by the traditional algorithm in classifying the minority class of imbalanced datasets. The algorithm firstly constructs balanced training sets by the clustering-based undersampling, using K-means clustering to cluster the majority class and extract cluster centroids and then merge with all minority class instances to generate a new balanced training set. To avoid the declining of the classiifcation accuracy caused by the shortage of training sets owing to too few minority class samples, SMOTE (Synthetic Minority Oversampling Technique) combining the clustering-based undersampling was used. Next, the misclassiifcation loss function in the basic classiifer of the AdaBoost algorithm was modiifed based on the cost-sensitive learning theory to assign asymmetric misclassiifcation losses to samples of different classes. The experimental results show that, the proposed algorithm makes the model training samples more representative and greatly increases the classiifcation accuracy of the minority class, keeping the overall classiifcation performance.%不平衡数据分类是机器学习研究领域中的一个热点问题。针对传统分类算法处理不平衡数据的少数类识别率过低问题，文章提出了一种基于聚类的改进AdaBoost分类算法。算法首先进行基于聚类的欠采样，在多数类样本上进行K均值聚类，之后提取聚类质心，与少数类样本数目一致的聚类质心和所有少数类样本组成新的平衡训练集。为了避免少数类样本数量过少而使训练集过小导致分类精度下降，采用少数过采样技术过采样结合聚类欠采样。然后，借鉴代价敏感学习思想，对AdaBoost算法的基分类器分类误差函数进行改进，赋予不同类别样本非对称错分损失。实验结果表明，算法使模型训练样本具有较高的代表性，在保证总体分类性能的同时提高了少数类的分类精度。

著录项

来源
《集成技术》 |2014年第2期|35-41|共7页
作者
胡小生; 张润晶; 钟勇;
展开▼
作者单位

佛山科学技术学院电子与信息工程学院佛山 528000;

佛山科学技术学院信息与教育技术中心佛山 528000;

佛山科学技术学院电子与信息工程学院佛山 528000;

展开▼
原文格式 PDF
正文语种 chi
中图分类人工智能理论;
关键词
不平衡数据分类; K均值聚类; AdaBoost; 集成学习;

相似文献

中文文献
外文文献
专利

1. 一种基于聚类的不平衡数据分类算法 [J] . 陈兴稣 ,王雪峰 . 信息技术 . 2013,第008期
2. 基于DPC聚类重采样结合ELM的不平衡数据分类算法 [J] . 董宏成 ,文志云 ,万玉辉 . 计算机工程与科学 . 2021,第010期
3. 一种结合Focal Loss的不平衡数据集提升树分类算法 [J] . 朱翌民 ,郭茹燕 ,巨家骥 . 软件导刊 . 2021,第011期
4. 一种基于特征选择的不平衡数据分类算法 [J] . 肖鹰 ,吴哲夫 ,张彤 . 集成技术 . 2016,第001期
5. 一种基于欠采样的不平衡数据分类算法 [J] . 程险峰 ,李军 ,李雄飞 . 计算机工程 . 2011,第013期
6. 一种基于改进SMOTE的不平衡数据集主动学习SVM分类算法 [C] . ZHAO Xiao-qiang ,赵小强 ,LIU Meng-yi . 2016年第27届中国过程控制会议 . 2016
7. 一种结合聚类和采样策略的不平衡数据分类算法 [A] . 张晨 . 2018

一种基于聚类提升的不平衡数据分类算法

摘要

著录项

相似文献

相关主题

期刊订阅