首页> 美国卫生研究院文献>PLoS Clinical Trials >Adaptive Swarm Balancing Algorithms for rare-event prediction in imbalanced healthcare data
【2h】

Adaptive Swarm Balancing Algorithms for rare-event prediction in imbalanced healthcare data

机译:自适应群平衡算法在不平衡医疗数据中的稀有事件预测

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Clinical data analysis and forecasting have made substantial contributions to disease control, prevention and detection. However, such data usually suffer from highly imbalanced samples in class distributions. In this paper, we aim to formulate effective methods to rebalance binary imbalanced dataset, where the positive samples take up only the minority. We investigate two different meta-heuristic algorithms, particle swarm optimization and bat algorithm, and apply them to empower the effects of synthetic minority over-sampling technique (SMOTE) for pre-processing the datasets. One approach is to process the full dataset as a whole. The other is to split up the dataset and adaptively process it one segment at a time. The experimental results reported in this paper reveal that the performance improvements obtained by the former methods are not scalable to larger data scales. The latter methods, which we call Adaptive Swarm Balancing Algorithms, lead to significant efficiency and effectiveness improvements on large datasets while the first method is invalid. We also find it more consistent with the practice of the typical large imbalanced medical datasets. We further use the meta-heuristic algorithms to optimize two key parameters of SMOTE. The proposed methods lead to more credible performances of the classifier, and shortening the run time compared to brute-force method.
机译:临床数据分析和预测为疾病的控制,预防和发现做出了重大贡献。但是,此类数据通常遭受类分布中样本高度不平衡的困扰。在本文中,我们旨在制定有效的方法来重新平衡二进制不平衡数据集,其中正样本仅占少数。我们研究了两种不同的元启发式算法,粒子群优化和bat算法,并将它们应用于增强合成少数样本过采样技术(SMOTE)的效果,从而对数据集进行预处理。一种方法是处理整个数据集。另一个是分割数据集,并一次自适应地处理一个片段。本文报道的实验结果表明,通过前一种方法获得的性能改进无法扩展到更大的数据规模。后一种方法(称为自适应群平衡算法)导致大型数据集的效率和有效性显着提高,而第一种方法无效。我们还发现它与典型的大型不平衡医学数据集的实践更加一致。我们进一步使用元启发式算法来优化SMOTE的两个关键参数。与强力方法相比,所提出的方法可提高分类器的性能,并缩短运行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号