首页> 外文OA文献 >Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study
【2h】

Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study

机译:机器学习模型的开发和验证识别高危手术患者使用自动策划电子健康记录数据(Pythia):回顾性,单现场研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

BACKGROUND:Pythia is an automated, clinically curated surgical data pipeline and repository housing all surgical patient electronic health record (EHR) data from a large, quaternary, multisite health institute for data science initiatives. In an effort to better identify high-risk surgical patients from complex data, a machine learning project trained on Pythia was built to predict postoperative complication risk. METHODS AND FINDINGS:A curated data repository of surgical outcomes was created using automated SQL and R code that extracted and processed patient clinical and surgical data across 37 million clinical encounters from the EHRs. A total of 194 clinical features including patient demographics (e.g., age, sex, race), smoking status, medications, comorbidities, procedure information, and proxies for surgical complexity were constructed and aggregated. A cohort of 66,370 patients that had undergone 99,755 invasive procedural encounters between January 1, 2014, and January 31, 2017, was studied further for the purpose of predicting postoperative complications. The average complication and 30-day postoperative mortality rates of this cohort were 16.0% and 0.51%, respectively. Least absolute shrinkage and selection operator (lasso) penalized logistic regression, random forest models, and extreme gradient boosted decision trees were trained on this surgical cohort with cross-validation on 14 specific postoperative outcome groupings. Resulting models had area under the receiver operator characteristic curve (AUC) values ranging between 0.747 and 0.924, calculated on an out-of-sample test set from the last 5 months of data. Lasso penalized regression was identified as a high-performing model, providing clinically interpretable actionable insights. Highest and lowest performing lasso models predicted postoperative shock and genitourinary outcomes with AUCs of 0.924 (95% CI: 0.901, 0.946) and 0.780 (95% CI: 0.752, 0.810), respectively. A calculator requiring input of 9 data fields was created to produce a risk assessment for the 14 groupings of postoperative outcomes. A high-risk threshold (15% risk of any complication) was determined to identify high-risk surgical patients. The model sensitivity was 76%, with a specificity of 76%. Compared to heuristics that identify high-risk patients developed by clinical experts and the ACS NSQIP calculator, this tool performed superiorly, providing an improved approach for clinicians to estimate postoperative risk for patients. Limitations of this study include the missingness of data that were removed for analysis. CONCLUSIONS:Extracting and curating a large, local institution's EHR data for machine learning purposes resulted in models with strong predictive performance. These models can be used in clinical settings as decision support tools for identification of high-risk patients as well as patient evaluation and care management. Further work is necessary to evaluate the impact of the Pythia risk calculator within the clinical workflow on postoperative outcomes and to optimize this data flow for future machine learning efforts.
机译:背景:Pythia是一种自动化,临床策划的外科数据管道和储存库,包括来自大型,第四纪,多站体的数据科学倡议的大型,第四纪,多立体健康研究所的所有外科患者电子健康记录(EHR)数据。为了更好地识别来自复杂数据的高风险外科患者,建立了一个在Pythia上培训的机器学习项目,以预测术后并发症风险。方法和调查结果:使用自动SQL和R码创建了手术结果的策划数据储存库,其从EHRS中提取和处理患者的临床和外科手术数据。共有194个临床特征,包括患者人口统计数据(例如,年龄,性别,种族),吸烟状态,药物,机理,程序信息和用于手术复杂性的代理。为预测术后并发症的目的,进一步研究了在2014年1月1日至2017年1月31日之间经过了99,755名侵入性程序遭遇的66,370名患者的群组。该队列的平均并发症和30天的术后死亡率分别为16.0%和0.51%。在这种外科队列上培训了最小的绝对收缩和选择操作员(套索)惩罚逻辑回归,随机森林模型和极端梯度提升决策树,其在14个特定的术后结果分组上具有交叉验证。产生的模型在接收器操作员特征曲线(AUC)值下的区域,范围为0.747和0.924,从数据的最后5个月的样本试验组计算。套索惩罚回归被确定为高性能模型,提供临床可解释的可行的洞察力。最高和最低表演套索模型预测术后休克和泌尿病成果,分别具有0.924(95%CI:0.901,0.946)和0.780(95%Ci:0.752,0.810)的AUC。创建需要输入9个数据字段的计算器,以对术后结果的14个分组产生风险评估。确定高风险阈值(15%的任何并发症的风险)确定高危手术患者。模型敏感性为76%,特异性为76%。与识别由临床专家和ACS NSQIP计算器开发的高风险患者的启发式,该工具优于,为临床医生提供了改进的患者术后风险的方法。本研究的局限性包括用于分析的数据的遗失。结论:提取和策划大型本地机构的机器学习目的的EHR数据导致模型具有强烈的预测性能。这些型号可用于临床环境中,作为决策支持工具,用于识别高风险患者以及患者评估和护理管理。进一步的工作是在术后结果的临床工作流程中评估Pythia风险计算器的影响,并优化未来机器学习努力的这种数据流。

著录项

相似文献

  • 外文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号