基于重采样技术在医学不平衡数据分类中的应用研究

闫慈; 田翔华; 阿拉依·阿汗; 张伟文; 曹明芹

首页> 中文期刊> 《中国卫生统计》 >基于重采样技术在医学不平衡数据分类中的应用研究

基于重采样技术在医学不平衡数据分类中的应用研究

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Objective Metabolic syndrome as the breakthrough point,the influence of imbalanced datasets on classification is discussed.The resampling technique is used to balance the datasets,and the classification of neural network and decision tree are compared.Methods (1) BP neural network and C4.5 decision tree are used to classify imbalanced datasets of different ratios.(2) The random oversampling,random undersampling,hybrids methods and synthetic data of four kinds of resampling technology are used to compare the datasets of before and after re-sampling and four resampling using neural network and decision tree,F-Measure,G-mean and AUC as the evaluation index performance of the model.Results (1) With the imbalanced proportion of datasets increases,the AUC decreased gradually,which indicates that the classification performance of the classification algorithm decreased with proportion of the imbalanced datasets.(2) After random oversampling had the best performance.Conclusion The performance of classification algorithms are improved by using random over-sampling.It is recommended that the random over-sampling is used before applying the classification algorithm in the medical imbalanced datasets.%目的以代谢综合征为例,探讨不平衡数据对分类算法的影响,并运用重采样技术对数据进行平衡化处理,比较神经网络、决策树的分类性能.方法采用随机过采样、随机欠采样、混合采样和人工合成数据四种重采样技术,比较数据重采样前后及四种数据重采样间使用神经网络、决策树分类的性能,以F-Measure,G-mean和AUC作为模型评价指标.结果 (1)分类算法性能随不平衡数据集不平衡比例的加剧而降低;(2)四种重采样技术中随机过采样后作用于BP神经网络、C4.5决策树分类性能最大.结论分类性能随数据集中患病率的降低而下降.采用随机过采样提高了算法的分类性能.建议在应用分类算法对医学不平衡数据分类前,采用随机过采样技术以提高分类性能.

著录项

来源
《中国卫生统计》 |2018年第2期|177-180,185|共5页
作者
闫慈; 田翔华; 阿拉依·阿汗; 张伟文; 曹明芹;
展开▼
作者单位

新疆医科大学公共卫生学院流行病与卫生统计学教研室 830000;

新疆医科大学医学工程技术学院计算机教研室;

新疆医科大学公共卫生学院流行病与卫生统计学教研室 830000;

新疆医科大学公共卫生学院流行病与卫生统计学教研室 830000;

新疆医科大学公共卫生学院流行病与卫生统计学教研室 830000;

展开▼
原文格式 PDF
正文语种 chi
中图分类
关键词
代谢综合征; 不平衡数据集; 重采样技术; 神经网络; 决策树;

相似文献

中文文献
外文文献
专利

1. 基于DPC聚类重采样结合ELM的不平衡数据分类算法 [J] . 董宏成 ,文志云 ,万玉辉 . 计算机工程与科学 . 2021,第010期
2. 不平衡数据分类中的数据重采样比较研究 [J] . 衷宇清 ,陈文文 ,李昭桦 . 通信技术 . 2020,第006期
3. 不平衡数据分类方法及其在入侵检测中的应用研究 [J] . 江颉 ,王卓芳 ,GONG Rong-sheng . 计算机科学 . 2013,第004期
4. 基于不平衡数据集的文本分类技术研究 [J] . 白凤凤 . 电脑编程技巧与维护 . 2010,第006期
5. 基于不平衡数据集的文本分类技术 [J] . 王成强 . 电脑知识与技术 . 2009,第036期
6. 基于重采样技术在医学不平衡数据分类中的应用研究 [C] . 闫慈 ,阿拉依·阿汗 ,张伟文 . 2017年中国卫生统计学学术年会 . -1
7. 基于邻近重采样和分类器排序的信用卡欺诈检测中不平衡数据研究 [A] . MAIRA ANIS . 2018

基于重采样技术在医学不平衡数据分类中的应用研究

摘要

著录项

相似文献

相关主题

期刊订阅