一种新的不平衡数据学习算法PCBoost

李雄飞; 李军; 董元方; 屈成伟

首页> 中文期刊> 《计算机学报》 >一种新的不平衡数据学习算法PCBoost

一种新的不平衡数据学习算法PCBoost

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Imbalanced data exists widely in the real world, and its classification is a hot topic in machine learning. Most traditional classification algorithms assume balanced class distribution or equal misclassification costs, while they do not work when dealing with the imbalanced data. On the one hand, an imbalanced data classification algorithm, named as PCBoost, is proposed in this paper. The algorithm constructs decision tree with information gain ratio as the splitting criterion, and regards the decision tree as a weak classifier. At the beginning of each iteration, the algorithm makes use of data synthesize method to add synthetic minority class examples in order to balance training information. After the sub-classifier is formed, the algorithm corrects the perturbation and deletes the synthetic examples that are not correctly classified. On the other hand, the data synthesize method is discussed, the theoretical analysis of training error boundary is put forward, and the choice of ensemble learning parameters is analyzed. The experimental results show that the PCBoost algorithm has advantages on imbalanced data classification problem.%现实世界中广泛存在不平衡数据,其分类问题是机器学习研究中的一个热点.多数传统分类算法假定类分布平衡或误分类代价均衡,在处理不平衡数据时,效果不够理想.文中提出一种不平衡数据分类算法-PCBoost.算法以信息增益率为分裂准则构建决策树,作为弱分类器.在每次迭代初始,利用数据合成方法添加合成的少数类样例,平衡训练信息；在子分类器形成后,修正“扰动”,删除未被正确分类的合成样例.文中讨论了数据合成方法,给出了训练误差界的理论分析,并分析了集成学习参数的选择.实验结果表明,PCBoost算法具有处理不平衡数据分类问题的优势.

著录项

来源
《计算机学报》 |2012年第2期|202-209|共8页
作者
李雄飞; 李军; 董元方; 屈成伟;
展开▼
作者单位

吉林大学符号计算与知识工程教育部重点实验室长春 130012;

吉林大学符号计算与知识工程教育部重点实验室长春 130012;

长春理工大学应用数学系长春 130022;

吉林大学符号计算与知识工程教育部重点实验室长春 130012;

长春理工大学经济管理学院长春 130022;

吉林大学符号计算与知识工程教育部重点实验室长春 130012;

展开▼
原文格式 PDF
正文语种 chi
中图分类人工智能理论;
关键词
数据挖掘; 不平衡数据; 集成学习; 提升; 扰动;

相似文献

中文文献
外文文献
专利

1. 一种新的基于类内不平衡数据学习支持向量机算法 [J] . 梁武 ,苏燕 . 科技通报 . 2017,第9期
2. 一种新的近邻密度SVM不平衡数据集分类算法 [J] . 刘悦婷 ,孙伟刚 ,张发菊 . 贵州大学学报（自然科学版） . 2019,第003期
3. 一种新的不平衡数据v-NSVDD多分类算法 [J] . 刘小平 ,徐桂云 ,任世锦 . 南京大学学报：自然科学版 . 2013,第2期
4. 面向不平衡数据集的一种基于SMOTE的集成学习算法 [J] . 杨毅 ,梅颖 . 丽水学院学报 . 2020,第005期
5. 一种不平衡数据渐进学习算法 [J] . 董元方 ,李雄飞 ,李军 . 计算机工程 . 2010,第024期
6. 一种基于改进SMOTE的不平衡数据集主动学习SVM分类算法 [C] . ZHAO Xiao-qiang ,赵小强 ,LIU Meng-yi . 2016年第27届中国过程控制会议 . 2016
7. 一种基于遗传算法的脉冲神经网络学习新算法 [A] . 王宁 . 2019

一种新的不平衡数据学习算法PCBoost

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅