首页> 外文会议>IEEE International Conference on Parallel and Distributed Systems >Multiple Balance Subsets Stacking for Imbalanced Healthcare Datasets
【24h】

Multiple Balance Subsets Stacking for Imbalanced Healthcare Datasets

机译:堆叠不平衡医疗数据集的多个余额子集

获取原文

摘要

Accurate prediction is highly important for clinical decision making and early treatment. In this paper, we study the imbalanced data problem in prediction, a key challenge existing in the healthcare area. Imbalanced datasets bias classifiers towards the majority class, leading to an unsatisfied classification prediction performance on the minority class, which is known as imbalance problem. Existing imbalance learning methods may suffer from issues like information loss, overfitting, and high training time cost. To tackle these issues, we propose a novel ensemble learning method called Multiple bAlance Subsets Stacking (MASS) by exploiting a multiple balance subsets construction strategy. Furthermore, we improve MASS with introducing parallelism (Parallel MASS) to reduce the training time cost. We evaluate MASS on three real-world healthcare datasets, and experimental results demonstrate that its prediction performance outperforms the state-of-art methods in terms of AUC, F1-score and MCC. Through the speedup analysis, Parallel MASS reduces the training time cost greatly on large dataset, and its speedup increases as the data size grows.
机译:对于临床决策和早期治疗,精确的预测非常重要。在本文中,我们研究了预测中的不平衡数据问题,在医疗领域存在的关键挑战。非衡度数据集偏置大多数类的偏置分类器,导致少数类别上的不满意的分类预测性能,称为不平衡问题。现有的不平衡学习方法可能遭受信息丢失,过度拟合和高训练时间成本等问题。为了解决这些问题,我们提出了一种新的集合学习方法,通过利用多余余额亚空建设策略来提出称为多个余额子集的新集合学习方法。此外,我们改善了引入平行(平行质量)以降低训练时间成本的质量。我们评估了三个现实世界医疗数据集的质量,实验结果表明,其预测性能在AUC,F1分数和MCC方面优于最先进的方法。通过加速分析,并联质量在大型数据集中大大降低了培训时间成本,随着数据大小的增长,其加速会增加。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号