...
首页> 外文期刊>American Journal of Cancer Research >Predicting long-term multicategory cause of death in patients with prostate cancer: random forest versus multinomial model
【24h】

Predicting long-term multicategory cause of death in patients with prostate cancer: random forest versus multinomial model

机译:预测前列腺癌患者的长期多语法原因:随机森林与多项式模型

获取原文
           

摘要

The majority of patients with prostate cancer die of non-cancer causes of death (COD). It is thus important to accurately predict multi-category COD in these patients. Random forest (RF), a popular machine learning model, has been shown useful for predicting binary cancer-specific deaths. However, its accuracy for predicting multi-category COD in cancer patients is unclear. We included patients in Surveillance, Epidemiology, and End Results-18 cancer registry-program with prostate cancer diagnosed in 2004 (followed-up through 2016). They were randomly divided into training and testing sets with equal sizes. We evaluated prediction accuracies of RF and conventional statistical/multinomial models for 6-category COD by data-encoding types using the 2-fold cross-validation approach. Among 49,864 prostate cancer patients, 29,611 (59.4%) were alive at the end of follow-up, and 5,448 (10.9%) died of cardiovascular disease, 4,607 (9.2%) of prostate cancer, 3,681 (7.4%) of non-prostate cancer, 717 (1.4%) of infection, and 5,800 (11.6%) of other causes. We predicted 6-category COD among these patients with a mean accuracy of 59.1% (n=240, 95% CI, 58.7%-59.4%) in RF models with one-hot encoding, and 50.4% (95% CI, 49.7%-51.0%) in multinomial models. Tumor characteristics, prostate-specific antigen level, and diagnosis confirmation-method were important in RF and multinomial models. In RF models, no statistical differences were found between the accuracies of training versus cross-validation phases, and those of categorical versus one-hot encoding. We here report that RF models can outperform multinomial logistic models (absolute accuracy-difference, 8.7%) in predicting long-term 6-category COD among prostate cancer patients, while pathology diagnosis itself and tumor pathology remain important factors.
机译:大多数患者前列腺癌的死亡导致死亡(COD)。因此,重要的是准确地预测这些患者中的多种类别鳕鱼。随机森林(RF)是一种流行的机器学习模型,已被证明可用于预测二元癌细胞的死亡。然而,其用于预测癌症患者中多类COD的准确性尚不清楚。我们包括监测,流行病学和最终结果-18癌症注册计划的患者,2004年诊断前列腺癌(随访2016年)。它们随机分为培训和测试集,等大小。通过使用2倍交叉验证方法,通过数据编码类型评估RF和传统统计/多项模型的预测精度。在49,864名前列腺癌中,29,611(59.4%)在随访结束时活着,5,448(10.9%)死于心血管疾病,4,607(9.2%)前列腺癌,3,681(7.4%)非前列腺癌癌症,717(1.4%)感染,5,800(11.6%)的其他原因。我们在这些患者中预测6类鳕鱼,其平均精度为59.1%(n = 240,95%CI,58.7%-59.4%,58.7%-59.4%),具有单热编码和50.4%(95%CI,49.7%多项式模型中的-51.0%)。肿瘤特征,前列腺特异性抗原水平和诊断确认方法在RF和多项式模型中是重要的。在RF模型中,在训练与交叉验证阶段的准确性之间没有发现统计差异,以及分类与单热编码的分类。我们在这里报告,RF模型可以胜过多项式物流模型(绝对精度差异,8.7%)预测前列腺癌患者的长期6分类鳕鱼,而病理诊断本身和肿瘤病理仍然是重要因素。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号