首页> 外文会议>Biomedical Engineering International Conference >Processing and analysis of imbalanced liver cancer patient data by case-based reasoning
【24h】

Processing and analysis of imbalanced liver cancer patient data by case-based reasoning

机译:基于案例推理的不平衡肝癌患者数据处理与分析

获取原文

摘要

The research on clinical data is one of the fastest growing fields all over the world. In general, most of the data have imbalanced issues, which may cause some problems in the researches. In this study, the methods of over-sampling and under-sampling are used for handling the issues of data imbalanced. The case based reasoning (CBR) is used for developing classification models to predict recurrent statuses of patients with liver cancer. Classification results of these two methods are compared with those of an original imbalanced dataset by the standard indicators, such as sensitivity, specificity, balanced accuracy (BAC), positive predictive value (PPV), negative predictive value (NPV), and accuracy. According to the preliminary results of classification methods, on average, the BAC of balanced methods of the under-sampling (66.07%) and the over-sampling (54.24%) exert a significant improvement compared with the imbalanced grouping dataset (48.33%). Most importantly, the under-sampling method could acquire the highest mean accuracy of the three datasets (under-sampling: 66.76%, over-sampling: 53.47%, imbalanced: 48.58%). In under-sampling method, mean PPV, NPV, and accuracy are higher than 65% (PPV: 65.44%, NPV: 69.44%, accuracy: 66.76%). The balanced datasets can provide benefits for classification models and efficiently reduce biased interpretations.
机译:临床数据研究是全世界发展最快的领域之一。通常,大多数数据存在失衡问题,这可能会引起研究中的一些问题。在这项研究中,使用过采样和欠采样的方法来处理数据不平衡的问题。基于案例的推理(CBR)用于开发分类模型,以预测肝癌患者的复发状态。通过标准指标将这两种方法的分类结果与原始不平衡数据集的分类结果进行比较,例如敏感性,特异性,平衡准确度(BAC),阳性预测值(PPV),阴性预测值(NPV)和准确性。根据分类方法的初步结果,与不平衡分组数据集(48.33%)相比,欠采样(66.07%)和过采样(54.24%)平衡方法的BAC平均而言有显着改善。最重要的是,欠采样方法可以获得三个数据集的最高平均准确度(欠采样率:66.76%;过采样率:53.47%;失衡率:48.58%)。在欠采样方法中,平均PPV,NPV和准确度均高于65%(PPV:65.44%,NPV:69.44%,准确度:66.76%)。平衡的数据集可以为分类模型提供好处,并有效减少有偏见的解释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号