首页> 美国卫生研究院文献>Entropy >Using Background Knowledge from Preceding Studies for Building a Random Forest Prediction Model: A Plasmode Simulation Study
【2h】

Using Background Knowledge from Preceding Studies for Building a Random Forest Prediction Model: A Plasmode Simulation Study

机译:使用先前研究的背景知识构建随机森林预测模型:Plasmode 仿真研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

There is an increasing interest in machine learning (ML) algorithms for predicting patient outcomes, as these methods are designed to automatically discover complex data patterns. For example, the random forest (RF) algorithm is designed to identify relevant predictor variables out of a large set of candidates. In addition, researchers may also use external information for variable selection to improve model interpretability and variable selection accuracy, thereby prediction quality. However, it is unclear to which extent, if at all, RF and ML methods may benefit from external information. In this paper, we examine the usefulness of external information from prior variable selection studies that used traditional statistical modeling approaches such as the Lasso, or suboptimal methods such as univariate selection. We conducted a plasmode simulation study based on subsampling a data set from a pharmacoepidemiologic study with nearly 200,000 individuals, two binary outcomes and 1152 candidate predictor (mainly sparse binary) variables. When the scope of candidate predictors was reduced based on external knowledge RF models achieved better calibration, that is, better agreement of predictions and observed outcome rates. However, prediction quality measured by cross-entropy, AUROC or the Brier score did not improve. We recommend appraising the methodological quality of studies that serve as an external information source for future prediction model development.
机译:人们对用于预测患者预后的机器学习 (ML) 算法越来越感兴趣,因为这些方法旨在自动发现复杂的数据模式。例如,随机森林 (RF) 算法旨在从大量候选项中识别相关的预测变量。此外,研究人员还可以使用外部信息进行变量选择,以提高模型的可解释性和变量选择的准确性,从而提高预测质量。然而,目前尚不清楚 RF 和 ML 方法可能在多大程度上(如果有的话)从外部信息中受益。在本文中,我们研究了来自先前变量选择研究的外部信息的有用性,这些研究使用传统的统计建模方法(如 Lasso)或次优方法(如单变量选择)。我们进行了一项血浆模式模拟研究,该研究基于对来自近 200,000 人、两个二元结果和 1152 个候选预测因子 (主要是稀疏二元) 变量的药物流行病学研究的数据集进行子抽样。当基于外部知识缩小候选预测变量的范围时,RF 模型实现了更好的校准,即预测和观察到的结果率有更好的一致性。然而,通过交叉熵、 AUROC 或 Brier 评分测量的预测质量并没有提高。我们建议评估作为未来预测模型开发的外部信息来源的研究的方法学质量。

著录项

代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号