首页> 外文学位 >Statistical Methods for Normalization and Analysis of High-Throughput Genomic Data.
【24h】

Statistical Methods for Normalization and Analysis of High-Throughput Genomic Data.

机译:高通量基因组数据归一化和分析的统计方法。

获取原文
获取原文并翻译 | 示例

摘要

High-throughput genomic datasets obtained from microarray or sequencing studies have revolutionized the field of molecular biology over the last decade. The complexity of these new technologies also poses new challenges to statisticians to separate biological relevant information from technical noise. Two methods are introduced that address important issues with normalization of array comparative genomic hybridization (aCGH) microarrays and the analysis of RNA sequencing (RNA-Seq) studies. Many studies investigating copy number aberrations at the DNA level for cancer and genetic studies use comparative genomic hybridization (CGH) on oligo arrays. However, aCGH data often suffer from low signal to noise ratios resulting in poor resolution of fine features. Bilke et al. [11] showed that the commonly used running average noise reduction strategy performs poorly when errors are dominated by systematic components. A method called pcaCGH is proposed that significantly reduces noise using a non-parametric regression on technical covariates of probes to estimate systematic bias. Then a robust principal components analysis (PCA) estimates any remaining systematic bias not explained by technical covariates used in the preceding regression. The proposed algorithm is demonstrated on two CGH datasets measuring the NCI-60 cell lines utilizing NimbleGen and Agilent microarrays. The method achieves a nominal error variance reduction of 60%-65% as well as an 2-fold increase in signal to noise ratio on average, resulting in more detailed copy number estimates. Furthermore, correlations of signal intensity ratios of NimbleGen and Agilent arrays are increased by 40% on average, indicating a significant improvement in agreement between the technologies.;A second algorithm called gamSeq is introduced to test for differential gene expression in RNA sequencing studies. Limitations of existing methods are outlined and the proposed algorithm is compared to these existing algorithms. Simulation studies and real data are used to show that gamSeq improves upon existing methods with regards to type I error control while maintaining similar or better power for a range of sample sizes for RNA-Seq studies. Furthermore, the proposed method is applied to detect differential 3' UTR usage.
机译:从微阵列或测序研究获得的高通量基因组数据集在过去十年中彻底改变了分子生物学领域。这些新技术的复杂性也给统计学家提出了新的挑战,将生物学相关信息与技术噪声区分开。引入了两种方法来解决阵列比较基因组杂交(aCGH)微阵列的标准化和RNA测序(RNA-Seq)研究分析的重要问题。许多研究DNA拷贝数异常的癌症研究和遗传研究都在寡核苷酸阵列上使用比较基因组杂交(CGH)。但是,aCGH数据通常遭受低信噪比的困扰,从而导致精细特征的分辨率差。 Bilke等。 [11]表明,当误差由系统组件控制时,常用的运行平均噪声降低策略效果不佳。提出了一种称为pcaCGH的方法,该方法可通过对探头的技术协变量进行非参数回归来估计系统偏差,从而显着降低噪声。然后,稳健的主成分分析(PCA)估算出任何剩余的系统偏差,而上述回归分析中未使用技术协变量来解释该偏差。在两个使用NimbleGen和安捷伦微阵列测量NCI-60细胞系的CGH数据集上证明了该算法。该方法可将名义误差方差降低60%-65%,并将信噪比平均提高2倍,从而获得更详细的拷贝数估计。此外,NimbleGen和安捷伦阵列的信号强度比的相关性平均提高了40%,表明这些技术之间的协议有了显着改善。;引入了第二种算法gamSeq来测试RNA测序研究中的差异基因表达。概述了现有方法的局限性,并将所提出的算法与这些现有算法进行了比较。仿真研究和真实数据用于显示gamSeq在I型错误控制方面对现有方法进行了改进,同时针对RNA-Seq研究的一系列样本量保持了相似或更好的功效。此外,所提出的方法被应用于检测差分3'UTR使用。

著录项

  • 作者

    Guennel, Tobias.;

  • 作者单位

    Virginia Commonwealth University.;

  • 授予单位 Virginia Commonwealth University.;
  • 学科 Biology Biostatistics.;Statistics.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 156 p.
  • 总页数 156
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号