首页> 外文学位 >New methods for variable selection with applications to survival analysis and statistical redundancy analysis using gene expression data.
【24h】

New methods for variable selection with applications to survival analysis and statistical redundancy analysis using gene expression data.

机译:变量选择的新方法,应用于通过基因表达数据进行的生存分析和统计冗余分析。

获取原文
获取原文并翻译 | 示例

摘要

An important application of microarray research is to develop cancer diagnostic and prognostic tools based on tumor genetic profiles. For easy interpretation, such studies aim to identify a small fraction of genes to build molecular predictors of clinical outcomes from at least thousands of genes thus require methodologies that can model high dimensional covariates and accomplish variable selection simultaneously.; One interesting area is modeling cancer patients' survival time or time to cancer reoccurrence with gene expression data. In the first part of this dissertation, we propose a new penalized weighted least squares method for model estimation and variable selection in accelerated failure time models. In this method, right censored observations are used as censoring constraints in optimizing the weighted least squares objective function. We also include ridge penalty to deal with singularity caused by collinearity and high dimensionality and use the least absolute shrinkage and selection operator to achieve model parsimony. Simulation studies demonstrate that adding censoring constraints improves model estimation and variable selection especially for data with high dimensional covariates. Real data examples show our method is able to identify genes that are relevant to patient survival times.; Another interesting area is cancer subtype classification using gene expression profiles. One important issue is to reduce redundancy caused by correlation among genes. Since genes with correlated expression levels may be co-expressed or belong to the same biological pathway related to the disease, including such genes into classifiers provides very little additional information. In the second part of the dissertation, we define an eigenvalue-ratio statistic to measure a gene's contribution to the joint discriminability of a set of genes. Based on this eigenvalue-ratio statistic, we define a novel hypothesis testing for gene statistical redundancy and propose two gene selection methods. Simulation studies illustrate the agreement between statistical redundancy testing and gene selection methods. Real data examples show the effectiveness of our eigenvalue-ratio statistic based gene selection methods. We also demonstrate that the selected compact gene subsets can not only be used to build high quality cancer classifiers but also have biological relevance.
机译:微阵列研究的重要应用是开发基于肿瘤遗传特征的癌症诊断和预后工具。为了便于解释,此类研究旨在从至少数千个基因中识别出一小部分基因,以构建临床结果的分子预测因子,因此需要能够建模高维协变量并同时完成变量选择的方法。一个有趣的领域是利用基因表达数据来模拟癌症患者的生存时间或癌症复发的时间。在本文的第一部分,我们提出了一种新的惩罚加权最小二乘方法,用于加速失效时间模型的模型估计和变量选择。在这种方法中,在优化加权最小二乘目标函数时,将右删失的观测值用作删失约束。我们还包括岭罚以处理由共线性和高维数引起的奇异性,并使用最小绝对收缩和选择算子来实现模型简约性。仿真研究表明,添加检查约束可以改善模型估计和变量选择,特别是对于具有高维协变量的数据。实际数据示例表明,我们的方法能够识别与患者生存时间相关的基因。另一个有趣的领域是使用基因表达谱对癌症亚型进行分类。一个重要的问题是减少由基因之间的相关性引起的冗余。由于具有相关表达水平的基因可能共表达或属于与该疾病相关的同一生物学途径,因此将此类基因纳入分类器提供的信息很少。在论文的第二部分,我们定义了一个特征值比统计量来测量一个基因对一组基因的联合可分辨性的贡献。基于此特征值比率统计,我们定义了一种新的基因统计冗余假设检验,并提出了两种基因选择方法。仿真研究说明了统计冗余测试和基因选择方法之间的一致性。实际数据示例显示了我们基于特征值比率统计的基因选择方法的有效性。我们还证明所选的紧凑基因子集不仅可以用于构建高质量的癌症分类器,而且具有生物学相关性。

著录项

  • 作者

    Hu, Simin.;

  • 作者单位

    Case Western Reserve University.;

  • 授予单位 Case Western Reserve University.;
  • 学科 Biology Biostatistics.
  • 学位 Ph.D.
  • 年度 2007
  • 页码 154 p.
  • 总页数 154
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物数学方法;
  • 关键词

  • 入库时间 2022-08-17 11:39:42

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号