首页> 外文会议>International Conference on Data Mining >Random Under-Sampling Ensemble Methods for Highly Imbalanced Rare Disease Classification
【24h】

Random Under-Sampling Ensemble Methods for Highly Imbalanced Rare Disease Classification

机译:随机欠抽样组合方法,用于高度罕见的罕见疾病分类

获取原文

摘要

Classification on imbalanced data presents lots of challenges to researchers. In healthcare settings, rare disease identification is one of the most difficult kinds of imbalanced classification. It is hard to correctly identify true positive rare disease patients out of much larger number of negative patients. The prediction using traditional models tends to bias towards much larger negative class. In order to gain better predictive accuracy, we select and test some modern imbalanced machine learning algorithms on an empirical rare disease dataset. The training data is constructed from the real world patient diagnosis and prescription data. In the end, we compare the performances from various algorithms. We find that the random under-sampling Random Forest algorithm has more than 40% improvement over traditional logistic model in this particular example. We also observe that not all bagging methods are out-performing than traditional methods. For example, the random under-sampling LASSO is inferior to benchmark in our reports. Researchers need to test and select appropriate methods accordingly in real world applications.
机译:关于不平衡数据的分类对研究人员提出了许多挑战。在医疗保健环境中,罕见的疾病鉴定是最困难的不平衡分类之一。难以正确识别真正的稀有疾病患者,其中有更多的阴性患者。使用传统模型的预测倾向于偏向更大的负类。为了获得更好的预测准确性,我们在经验罕见的疾病数据集中选择和测试一些现代不平衡机学习算法。培训数据由现实世界患者诊断和处方数据构成。最后,我们将性能与各种算法进行比较。我们发现,在该特定示例中,对传统物流模型的随机抽样随机林算法有超过40%。我们还观察到,并非所有装袋方法都是传统方法的。例如,随机欠采样套索差不多在我们的报告中的基准。研究人员需要在现实世界应用中进行测试和选择适当的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号