首页> 中文期刊> 《计算机应用与软件》 >不平衡数据分类研究及其应用

不平衡数据分类研究及其应用

         

摘要

针对传统机器学习算法对于不平衡数据少数类的分类精度较低的问题.分析了造成该问题的原因,进而提出一种欠抽样数据处理方法,提高少数类分类精度.该方法通过k-means算法对样本进行多次聚类,删除多数类的噪声以及多数类与少数类重叠度较高的样本.同时引入删除因子λ,降低多数类丢失特性的风险.通过对UCI数据集的实验分析,经该方法处理,分类算法对少数类的召回率和F值均有提高,证明该方法能有效提高少数类的分类精度.最后将方法应用于预测肺癌患者的术后预期寿命,患者一年期死亡率的召回率和F值分别提高42%和23%.%In light of the problem that the traditional machine learning algorithm has low classification accuracy for minority classes of unbalanced data.In this paper,we analyzed the causes of the problem and then proposed an undersampling method to improve the classification accuracy of minority classes.This method uses the k-means algorithm to cluster the samples many times,and removes the noise of most classes,as well as the samples with the highest degree of overlap.At the same time,we introduced the deletion factor λ to avoid the important information loss of majority classes.Through the experimental analysis of the UCI datasets,the traditional classification algorithm improved the Recall rate and the F-measure of minority classes.The result of the work implied that the method could improve the classification accuracy of minority classes.Finally,the method was used for medical application of predicting post-operative life expectancy in the lung cancer patients.The experiment showed the recall rate and F-measure of the lung cancer patients' one-year mortality was increased by 42% and 23%.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号