首页> 外文期刊>Journal of applied statistics >Propensity score prediction for electronic healthcare databases using super learner and high-dimensional propensity score methods
【24h】

Propensity score prediction for electronic healthcare databases using super learner and high-dimensional propensity score methods

机译:使用超级学习者和高维倾向得分方法的电子医疗数据库倾向得分预测

获取原文
获取原文并翻译 | 示例
           

摘要

The optimal learner for prediction modeling varies depending on the underlying data-generating distribution. Super Learner (SL) is a generic ensemble learning algorithm that uses cross-validation to select among a 'library' of candidate prediction models. While SL has been widely studied in a number of settings, it has not been thoroughly evaluated in large electronic healthcare databases that are common in pharmacoepidemiology and comparative effectiveness research. In this study, we applied and evaluated the performance of SL in its ability to predict the propensity score (PS), the conditional probability of treatment assignment given baseline covariates, using three electronic healthcare databases. We considered a library of algorithms that consisted of both nonparametric and parametric models. We also proposed a novel strategy for prediction modeling that combines SL with the high-dimensional propensity score (hdPS) variable selection algorithm. Predictive performance was assessed using three metrics: the negative log-likelihood, area under the curve (AUC), and time complexity. Results showed that the best individual algorithm, in terms of predictive performance, varied across datasets. The SL was able to adapt to the given dataset and optimize predictive performance relative to any individual learner. Combining the SL with the hdPS was the most consistent prediction method and may be promising for PS estimation and prediction modeling in electronic healthcare databases.
机译:预测建模的最佳学习者根据基础数据生成分布而变化。超级学习者(SL)是一种通用的集成学习算法,它使用交叉验证在候选预测模型的“库”中进行选择。虽然SL已在许多环境中进行了广泛研究,但尚未在大型流行的药典流行病学和比较有效性研究中广泛使用的电子医疗数据库中对其进行彻底评估。在这项研究中,我们使用三个电子医疗数据库,应用了SL的性能并评估了SL在预测倾向评分(PS),给定基线协变量的条件下进行治疗的条件概率的能力。我们考虑了一个由非参数模型和参数模型组成的算法库。我们还提出了一种将SL与高维倾向得分(hdPS)变量选择算法相结合的预测建模新策略。使用以下三个指标评估了预测性能:对数可能性为负数,曲线下面积(AUC)和时间复杂度。结果表明,就预测性能而言,最佳的个体算法在数据集中有所不同。 SL能够适应给定的数据集并相对于任何单个学习者优化预测性能。 SL与hdPS的组合是最一致的预测方法,对于电子医疗数据库中的PS估计和预测建模可能很有希望。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号