首页> 外文学位 >Using the C-index to measure prediction accuracy and variable importance of random forests, with application to tissue microarray data.
【24h】

Using the C-index to measure prediction accuracy and variable importance of random forests, with application to tissue microarray data.

机译:使用C指数测量随机森林的预测准确性和可变重要性,并将其应用于组织微阵列数据。

获取原文
获取原文并翻译 | 示例

摘要

Tissue microarray (TMA) is a state-of-art technique for high throughput molecular analysis of large number of tumor samples in a single staining reaction. TMAs allow one to evaluate highly specialized tumor marker expression patterns which may lead to improved diagnostic, prognostic and therapeutic applications in the clinic.;Typically, TMA data have relatively few observations yet many highly skewed and correlated covariates with weak marginal effects. Random forests (RF) predictors [Bre01] are known to produce improved accuracy with such data. In this thesis, we propose to use the C-index as an alternative prediction accuracy measure to the error rate for RF predictors. Unlike the error rate, the C-index compares the overall distribution of the posterior predictions and sidesteps the need to specify the cost function and the classification threshold. We prove that the C-index is far superior to the error rate in determining the prediction accuracy and variable importance of RF predictors in certain situations. We also introduce a C-margin to measure the prediction strength of individual observations. Based on these C-margins, we propose new measures of variable importance. We show that the C-margin based importance measures are superior to the current E-margin based importance measures and the Gini index especially when the class prevalence is unbalanced. We apply our proposed methods to benchmark data from the UCI repository and to our simulated data.;We extend the use of the C-index and C-margins to other important data areas such as those with continuous outcomes and censored outcomes, We find that the C-margin based variable importance measures often outperform existing measures. Furthermore, we extend the local full likelihood method proposed by LeBlance and Crowley [LC92] for the construction of residual-based survival random forest predictors, Employing the C-index and the C-margins, we find that our residual-based survival random forests predictor outperforms Breiman's survival random forest predictor (2001) especially in finding important covariates.
机译:组织微阵列(TMA)是用于在单个染色反应中对大量肿瘤样品进行高通量分子分析的最新技术。 TMA可以评估高度专业化的肿瘤标志物表达模式,这可能会改善临床中的诊断,预后和治疗应用。通常,TMA数据观察较少,但许多偏斜和相关协变量具有微弱的边际效应。已知随机森林(RF)预测变量[Bre01]可以使用此类数据产生更高的准确性。在本文中,我们建议使用C指数作为RF预测器错误率的替代预测精度度量。与错误率不同,C索引会比较后验预测的总体分布,并回避指定成本函数和分类阈值的需求。我们证明,在某些情况下,C指数在确定RF预测变量的预测准确性和可变重要性方面远优于错误率。我们还引入了C余量来衡量单个观测值的预测强度。基于这些C边距,我们提出了可变重要性的新度量。我们显示,基于C边距的重要性度量优于当前基于E边距的重要性度量和基尼系数,尤其是在班级患病率不平衡的情况下。我们将提出的方法应用于UCI存储库中的基准数据和模拟数据。;我们将C-index和C-margins的使用扩展到其他重要数据区域,例如具有连续结果和审查结果的数据区域,我们发现基于C边距的可变重要性度量通常优于现有度量。此外,我们扩展了LeBlance和Crowley [LC92]提出的局部完全似然方法,用于构建基于残差的生存随机森林预测变量,利用C指数和C边距,我们发现基于残差的生存随机森林预测指标优于Breiman的生存随机森林预测指标(2001年),尤其是在发现重要协变量时。

著录项

  • 作者

    Huang, Yunda.;

  • 作者单位

    University of California, Los Angeles.;

  • 授予单位 University of California, Los Angeles.;
  • 学科 Biology Biostatistics.
  • 学位 Ph.D.
  • 年度 2004
  • 页码 98 p.
  • 总页数 98
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物数学方法;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号