首页> 中文期刊> 《中国卫生统计》 >基于重采样技术在医学不平衡数据分类中的应用研究

基于重采样技术在医学不平衡数据分类中的应用研究

         

摘要

Objective Metabolic syndrome as the breakthrough point,the influence of imbalanced datasets on classification is discussed.The resampling technique is used to balance the datasets,and the classification of neural network and decision tree are compared.Methods (1) BP neural network and C4.5 decision tree are used to classify imbalanced datasets of different ratios.(2) The random oversampling,random undersampling,hybrids methods and synthetic data of four kinds of resampling technology are used to compare the datasets of before and after re-sampling and four resampling using neural network and decision tree,F-Measure,G-mean and AUC as the evaluation index performance of the model.Results (1) With the imbalanced proportion of datasets increases,the AUC decreased gradually,which indicates that the classification performance of the classification algorithm decreased with proportion of the imbalanced datasets.(2) After random oversampling had the best performance.Conclusion The performance of classification algorithms are improved by using random over-sampling.It is recommended that the random over-sampling is used before applying the classification algorithm in the medical imbalanced datasets.%目的 以代谢综合征为例,探讨不平衡数据对分类算法的影响,并运用重采样技术对数据进行平衡化处理,比较神经网络、决策树的分类性能.方法 采用随机过采样、随机欠采样、混合采样和人工合成数据四种重采样技术,比较数据重采样前后及四种数据重采样间使用神经网络、决策树分类的性能,以F-Measure,G-mean和AUC作为模型评价指标.结果 (1)分类算法性能随不平衡数据集不平衡比例的加剧而降低;(2)四种重采样技术中随机过采样后作用于BP神经网络、C4.5决策树分类性能最大.结论 分类性能随数据集中患病率的降低而下降.采用随机过采样提高了算法的分类性能.建议在应用分类算法对医学不平衡数据分类前,采用随机过采样技术以提高分类性能.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号