首页> 外文会议>Computing science and statistics >Assessing Patient Survival Using Microarray Gene Expression Data Via Partial Least Squares Proportional Hazard Regression
【24h】

Assessing Patient Survival Using Microarray Gene Expression Data Via Partial Least Squares Proportional Hazard Regression

机译:使用微阵列基因表达数据通过偏最小二乘比例风险回归法评估患者的生存率

获取原文
获取原文并翻译 | 示例

摘要

High dimensional data sets from microarray experiments where the number of variables (genes) p far exceed the number of samples N render most traditional statistical tools of little direct use. However, some of these statistical tools when used in conjunction with an appropriate dimension reduction method can be effective. In this paper we introduce the use the proportional hazard (PH) regression (Cox 1972) in conjunction with dimension reduction by partial least squares (PLS), since the number of covariates p exceeds the number of samples N. This setting is typical of gene expression data from DNA microarrays. Specifically, for a given vector of response values which are times to event (death or censored times) and p gene expressions (covariates) we address the issue of how to assess (estimate) the survival experience (curve) when N p. The approach taken to cope with the high dimensionality is to reduce the dimension via some dimension reduction (component extraction) method in the first stage and then estimate the survival distribution using a PH regression model in the second stage. The primary methods of component extraction considered is PLS. PLS achieves dimension reduction by constructing components to maximize the covariance between he response (survival times) and the linear combination of the covariates (gene expressions) sequentially. This is analogous to principal components analysis (PCA) but the optimization criterion in PCA is variance rather than covariance in PLS. We demonstrate the use of the methodology to a diffuse large B-cell lymphoma (DLBCL) complementary DNA (cDNA) data set.
机译:来自微阵列实验的高维数据集,其中变量(基因)的数量p远远超过样本数量N,这使得大多数传统的统计工具几乎没有直接使用。但是,其中一些统计工具与适当的降维方法结合使用时可能是有效的。在本文中,我们介绍了比例风险(PH)回归(Cox 1972)与偏最小二乘(PLS)降维一起使用的方法,因为协变量p的数量超过了样本N的数量。此设置是基因的典型特征DNA微阵列的表达数据。具体来说,对于给定的响应值载体,该响应值是事件发生时间(死亡或审查时间)和p基因表达(协变量),当N << p时,我们要解决如何评估(估计)生存经验(曲线)的问题。解决高维问题的方法是在第一阶段通过某种降维(分量提取)方法缩小维度,然后在第二阶段使用PH回归模型估算生存分布。考虑的成分提取的主要方法是PLS。 PLS通过构造组件以最大程度地降低维度,方法是依次构造响应(生存时间)和协变量的线性组合(基因表达)之间的协方差。这类似于主成分分析(PCA),但PCA中的优化标准是方差而不是PLS中的协方差。我们演示了该方法的使用,以弥漫性大B细胞淋巴瘤(DLBCL)互补DNA(cDNA)数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号