首页> 美国卫生研究院文献>Briefings in Bioinformatics >Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data
【2h】

Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data

机译:使用交叉验证基于高维数据评估生存风险分类器的预测准确性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Developments in whole genome biotechnology have stimulated statistical focus on prediction methods. We review here methodology for classifying patients into survival risk groups and for using cross-validation to evaluate such classifications. Measures of discrimination for survival risk models include separation of survival curves, time-dependent ROC curves and Harrell’s concordance index. For high-dimensional data applications, however, computing these measures as re-substitution statistics on the same data used for model development results in highly biased estimates. Most developments in methodology for survival risk modeling with high-dimensional data have utilized separate test data sets for model evaluation. Cross-validation has sometimes been used for optimization of tuning parameters. In many applications, however, the data available are too limited for effective division into training and test sets and consequently authors have often either reported re-substitution statistics or analyzed their data using binary classification methods in order to utilize familiar cross-validation. In this article we have tried to indicate how to utilize cross-validation for the evaluation of survival risk models; specifically how to compute cross-validated estimates of survival distributions for predicted risk groups and how to compute cross-validated time-dependent ROC curves. We have also discussed evaluation of the statistical significance of a survival risk model and evaluation of whether high-dimensional genomic data adds predictive accuracy to a model based on standard covariates alone.
机译:全基因组生物技术的发展刺激了对预测方法的统计关注。我们在这里回顾了将患者分为生存风险组并使用交叉验证评估此类分类的方法。生存风险模型的判别方法包括生存曲线的分离,与时间有关的ROC曲线和Harrell的一致性指数。但是,对于高维数据应用程序,将这些度量计算为对用于模型开发的相同数据的重新替代统计,会导致高度偏差的估计。使用高维数据进行生存风险建模的方法学上的大多数发展都将单独的测试数据集用于模型评估。交叉验证有时已用于优化调整参数。但是,在许多应用中,可用数据太有限,无法有效地划分为训练和测试集,因此,作者经常报告重新替代统计数据或使用二进制分类方法分析其数据,以利用熟悉的交叉验证。在本文中,我们试图指出如何利用交叉验证来评估生存风险模型。特别是如何计算交叉验证的预测风险人群生存分布的估计值,以及如何计算交叉验证的时间依赖性ROC曲线。我们还讨论了生存风险模型的统计显着性评估以及高维基因组数据是否仅基于标准协变量为模型增加了预测准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号