首页> 外文学位 >Discriminant analysis using multi-gene profiles in molecular classification of breast cancer.
【24h】

Discriminant analysis using multi-gene profiles in molecular classification of breast cancer.

机译:在乳腺癌分子分类中使用多基因谱的判别分析。

获取原文
获取原文并翻译 | 示例

摘要

Gene expression data derived from microarrays provide a promising tool for the diagnosis of molecular cancers. However, due to the large dimensions and the complexity of such data, it is challenging to find a reduced set of "informative genes" before a formal classification analysis. In the past few years, many marginal single-gene statistical measures have been applied to expression data despite the fact that gene-gene interactions are non-negligible. In this thesis, in order to capture the interactions among genes, we propose to study methods based on two statistics: "Gene Profile Association Score" (GPAS) and "signed Gene Profile Association Score" (sGPAS). These two statistics are designed to capture high-order gene associations through a similar iterative screening process respectively. Therefore, not only genes with marginal significance, but also those containing interactive information will be detected. We also create linear prediction models for both GPAS and sGPAS, evaluate their performance in a real microarray data with 78 breast cancer patients and compare the results with various existing supervised classification methods. Our proposed statistics empirically outperform all other marginal predictors under a framework of 13-fold cross-validation. In addition, they are able to detect several oncogenes with large p-values, which have not been characterized by marginal feature selection measures. Our findings indicate that GPAS and sGPAS may become very useful methods to explore the complexity of microarray data in the future. These statistics can be applied to general association related high dimensional pattern recognition problems as well. We also provide theoretical proofs for statistical inferences of these scores.
机译:来源于微阵列的基因表达数据为分子癌的诊断提供了有前途的工具。然而,由于此类数据的大尺寸和复杂性,在正式分类分析之前寻找减少的“信息基因”集具有挑战性。在过去的几年中,尽管基因与基因之间的相互作用不可忽略,但许多边缘单基因统计方法已应用于表达数据。在本文中,为了捕获基因之间的相互作用,我们建议研究基于两种统计数据的方法:“基因概况关联评分”(GPAS)和“有符号基因概况关联评分”(sGPAS)。这两个统计数据旨在分别通过相似的迭代筛选过程捕获高阶基因关联。因此,不仅将检测具有边际意义的基因,而且还将检测包含交互信息的基因。我们还为GPAS和sGPAS创建了线性预测模型,在78位乳腺癌患者的真实微阵列数据中评估了它们的性能,并将结果与​​各种现有的监督分类方法进行了比较。在13倍交叉验证的框架下,我们提出的统计数据在经验上优于所有其他边际预测指标。此外,他们能够检测到一些具有较大p值的致癌基因,而这些特征尚未通过边缘特征选择措施来表征。我们的发现表明,GPAS和sGPAS可能会成为将来探索微阵列数据复杂性的非常有用的方法。这些统计数据也可以应用于一般关联相关的高维模式识别问题。我们还为这些分数的统计推断提供了理论依据。

著录项

  • 作者

    Yan, Xin.;

  • 作者单位

    Columbia University.;

  • 授予单位 Columbia University.;
  • 学科 Statistics.; Biology Biostatistics.; Biology Genetics.; Health Sciences Oncology.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 105 p.
  • 总页数 105
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 统计学;生物数学方法;遗传学;肿瘤学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号