首页> 外文期刊>Computers in Biology and Medicine >A learning method for the class imbalance problem with medical data sets.
【24h】

A learning method for the class imbalance problem with medical data sets.

机译:具有医学数据集的班级不平衡问题的学习方法。

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

In medical data sets, data are predominately composed of "normal" samples with only a small percentage of "abnormal" ones, leading to the so-called class imbalance problems. In class imbalance problems, inputting all the data into the classifier to build up the learning model will usually lead a learning bias to the majority class. To deal with this, this paper uses a strategy which over-samples the minority class and under-samples the majority one to balance the data sets. For the majority class, this paper builds up the Gaussian type fuzzy membership function and alpha-cut to reduce the data size; for the minority class, we use the mega-trend diffusion membership function to generate virtual samples for the class. Furthermore, after balancing the data size of classes, this paper extends the data attribute dimension into a higher dimension space using classification related information to enhance the classification accuracy. Two medical data sets, Pima Indians' diabetes and the BUPA liver disorders, are employed to illustrate the approach presented in this paper. The results indicate that the proposed method has better classification performance than SVM, C4.5 decision tree and two other studies.
机译:在医学数据集中,数据主要由“正常”样本组成,而只有很少一部分“异常”样本,从而导致所谓的类别失衡问题。在班级不平衡问题中,将所有数据输入到分类器中以建立学习模型通常会导致大多数班级的学习偏见。为了解决这个问题,本文采用了一种策略:对少数群体进行过度采样,对少数群体进行过度采样,以平衡数据集。对于多数类,本文建立了高斯型模糊隶属度函数和alpha割以减小数据量。对于少数类,我们使用大趋势扩散隶属函数生成该类的虚拟样本。此外,在平衡类的数据大小之后,本文使用与分类有关的信息将数据属性维度扩展到更高维度的空间,以提高分类的准确性。两种医学数据集,即比马印第安人的糖尿病和BUPA肝病,被用来说明本文提出的方法。结果表明,与SVM,C4.5决策树和其他两项研究相比,该方法具有更好的分类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号