首页> 外文期刊>JCO clinical cancer informatics. >Computing the Hazard Ratios Associated With Explanatory Variables Using Machine Learning Models of Survival Data
【24h】

Computing the Hazard Ratios Associated With Explanatory Variables Using Machine Learning Models of Survival Data

机译:使用机器学习模型计算与解释变量相关的危害比率

获取原文
获取原文并翻译 | 示例
           

摘要

PURPOSE The application of Cox proportional hazards (CoxPH) models to survival data and the derivation of hazard ratio (HR) are well established. Although nonlinear, tree-based machine learning (ML) models have been developed and applied to the survival analysis, no methodology exists for computing HRs associated with explanatory variables from such models. We describe a novel way to compute HRs from tree-based ML models using the SHapley Additive explanation values, which is a locally accurate and consistent methodology to quantify explanatory variables' contribution to predictions.METHODS We used three sets of publicly available survival data consisting of patients with colon, breast, or pan cancer and compared the performance of CoxPH with the state-of-the-art ML model, XGBoost. To compute the HR for explanatory variables from the XGBoost model, the SHapley Additive explanation values were exponentiated and the ratio of the means over the two subgroups was calculated. The Cl was computed via bootstrapping the training data and generating the ML model 1,000 times. Across the three data sets, we systematically compared HRs for all explanatory variables. Open-source libraries in Python and R were used in the analyses.RESULTS For the colon and breast cancer data sets, the performance of CoxPH and XGBoost was comparable, and we showed good consistency in the computed HRs. In the pan-cancer data set, we showed agreement in most variables but also an opposite finding in two of the explanatory variables between the CoxPH and XGBoost result. Subsequent Kaplan-Meier plots supported the finding of the XGBoost model.CONCLUSION Enabling the derivation of HR from ML models can help to improve the identification of risk factors from complex survival data sets and to enhance the prediction of clinical trial outcomes.
机译:目的是将COX比例危害(COXPH)模型应用于生存数据和危害比(HR)的推导。尽管已经开发了非线性,基于树的机器学习(ML)模型并将其应用于生存分析,但不存在计算与此类模型的解释变量相关的HRS的方法。我们描述了一种使用Shapley添加性解释值计算基于树的ML模型的HR的新颖方法,这是一种本地准确,一致的方法,用于量化解释变量对预测的贡献。方法我们使用了三组由公开可用的生存数据组成的数据结肠癌,乳腺癌或锅癌患者,并将Coxph的表现与最先进的ML模型XGBoost进行了比较。为了计算来自XGBoost模型的解释变量的HR,指出了Shapley添加性解释值,并计算了平均值与两个亚组的比率。通过引导数据计算CL,并生成ML型号1,000次。在这三个数据集中,我们系统地比较了所有解释变量的HRS。在分析中使用了Python和R中的开源库。结肠癌和乳腺癌数据集的结果,Coxph和Xgboost的性能是可比的,我们在计算的HRS中表现出良好的一致性。在Pan-Cancer数据集中,我们在大多数变量中显示了一致性,但在Coxph和Xgboost结果之间的两个解释变量中也有相反的发现。随后的Kaplan-Meier图支持了XGBoost模型的发现。结论可以从ML模型中推导HR,这可以帮助改善从复杂的存活数据集中识别风险因素,并增强临床试验结果的预测。

著录项

相似文献

  • 外文文献
  • 中文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号