首页> 外文学位 >Computationally intensive statistical methods for analysis of gene expression data.
【24h】

Computationally intensive statistical methods for analysis of gene expression data.

机译:分析基因表达数据的计算密集型统计方法。

获取原文
获取原文并翻译 | 示例

摘要

In recent years, gene expression experiments have become increasingly common in molecular biology and biomedical research. Utilizing high density array technologies, researchers are now able to simultaneously measure the expression of thousands of genes in one or more samples. Typically, array experiments are performed for a sample of subjects (e.g. patients, cells, mice) drawn from a population of interest. Because the number of genes studied far exceeds the number of samples, statistical rigor is particularly important in this setting, and new statistical methods are needed for appropriate and accurate data analysis.; Questions of interest include how to identify (i) statistically significant subsets of genes (e.g. genes differently expressed in two populations); (ii) groups of genes whose expression patterns across subjects are significantly correlated, since such genes might be part of the same causal mechanism or pathway; (iii) subpopulations of subjects whose gene expression profiles are significantly correlated; and (iv) groups of genes whose expression patterns have a significantly similar association with an outcome (e.g. survival, disease progression, phenotype). Each of these problems provides an opportunity for statistical inference, and methods are needed which adequately address the high dimension of the data.; We describe a general statistical framework for analysis of gene expression data, including bootstrap methods for assessing the reliability and repeatability of an experiment. We also propose specific new methods for multiple hypothesis testing (question (i)), clustering genes and subjects, possibly simultaneously (questions (ii) and (iii)), and supervised clustering (question (iv)). The asymptotic validity and finite sample performance of these computationally intensive statistical techniques are studied in simulations. Their power to answer biologically important questions is then demonstrated on a collection of experimental data sets, some of which are publicly available.
机译:近年来,基因表达实验在分子生物学和生物医学研究中变得越来越普遍。利用高密度阵列技术,研究人员现在能够同时测量一个或多个样品中数千种基因的表达。通常,对从目标人群中抽取的受试者(例如患者,细胞,小鼠)样品进行阵列实验。由于研究的基因数量远远超过了样本数量,因此在这种情况下统计严格性尤为重要,并且需要新的统计方法来进行适当而准确的数据分析。感兴趣的问题包括如何识别(i)具有统计学意义的基因子集(在两个群体中表达不同的例如基因); (ii)跨受试者表达模式显着相关的基因组,因为这些基因可能是同一因果机制或途径的一部分; (iii)基因表达谱显着相关的受试者亚群; (iv)其表达模式与结局(例如生存,疾病进展,表型)具有显着相似关联的基因组。这些问题中的每一个都提供了进行统计推断的机会,并且需要适当解决数据高维度的方法。我们描述了用于基因表达数据分析的一般统计框架,包括用于评估实验的可靠性和可重复性的自举方法。我们还提出了用于多重假设检验(问题(i)),基因和主题可能同时(问题(ii)和(iii))以及监督性聚类(问题(iv))的特定新方法。在仿真中研究了这些计算密集型统计技术的渐近有效性和有限样本性能。然后,他们在一系列重要的实验数据集上证明了他们回答重要生物学问题的能力,其中一些是公开的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号