首页> 外文学位 >Two Topics in Association Analysis of DNA Sequencing Data: Population Structure and Multivariate Traits.
【24h】

Two Topics in Association Analysis of DNA Sequencing Data: Population Structure and Multivariate Traits.

机译:DNA测序数据的关联分析中的两个主题:种群结构和多元性状。

获取原文
获取原文并翻译 | 示例

摘要

As the next-generation sequencing technologies become mature and affordable, we now have access to massive data of single nucleotides variants (SNVs) with varying minor allele frequencies (MAFs). This poses new opportunities, as more information from the human genome is available. However, new challenges also show up, such as how to utilize those SNVs with low MAFs. With current intensive efforts in association testing to detect genetic loci associated with common diseases and complex traits, two issues are of primary interest: reducing spurious findings and increasing power for true discoveries.;In association testing, a major cause to the elevated level of false positives is the confounding effect of population structure---the so-called population stratification.;As a remedy, one popular method is to add principal components (PCs) in a regression model, named principal component regression (PCR).;Yet, it is not clear how PCR will work in testing rare variants (RVs, with MAF < 0.01), or with population stratification in a fine scale. More questions arise, like what types and what sets of SNVs should be used to construct PCs, and whether there are other better methods than principal component analysis (PCA) for constructing PCs. Utilizing the DNA sequencing data from the 1000 Genomes project, we first investigate whether PCR is adequate in adjusting for population stratification while maintaining high power when testing low frequency variants (LFVs with 0.01 ≤ MAF<0.05) and RVs. Furthermore, we compare the performance of two dimension reduction methods, PCA and spectral dimension reduction (SDR), as well as twelve different types and sets of variants for constructing PCs. The comparison is conducted with respect to controlling population stratification in a fine scale.;On the other hand, linear mixed models (LMM) have emerged with its superior performance in handling complex population structures. Herein, we examine the connection and difference between PCR and LMM based on the formulation of probabilistic PCA, and propose a hybrid method combining the two. Its outstanding performance in addressing both population structure and environmental confounders is established by simulations using the Genetic Analysis Workshop (GAW) 18 data and the 1000 Genomes project data.;Lastly, we consider boosting power for association analysis of multivariate traits. A new class of tests, the sum of powered score tests (SPU), and an adaptive SPU (aSPU) test are extended to the generalized estimation equations (GEE) framework. We apply the new and some existing methods to association testing on both CVs and RVs with an HIV/AIDS dataset and the GAW 18 data.
机译:随着下一代测序技术的成熟和价格合理,我们现在可以访问具有不同次要等位基因频率(MAF)的单核苷酸变体(SNV)的大量数据。随着来自人类基因组的更多信息的获得,这带来了新的机遇。但是,还出现了新的挑战,例如如何利用MAF较低的SNV。当前在关联测试中进行了大量工作以检测与常见疾病和复杂性状相关的遗传基因座,两个主要问题是:减少虚假发现并增加进行真正发现的能力。关联测试中,导致假错误增加的主要原因积极是人口结构的混杂效应-所谓的人口分层。作为一种补救方法,一种流行的方法是在回归模型中添加主成分(PC),称为主成分回归(PCR)。尚不清楚PCR如何在测试稀有变异体(RV,MAF <0.01)或大规模人群分层中起作用。出现了更多的问题,例如应该使用什么类型的SNV和什么组的SNV来构造PC,以及是否存在比主成分分析(PCA)更好的方法来构造PC。利用来自1000个基因组计划的DNA测序数据,我们首先研究了在测试低频变异(LFV≤0.01≤MAF <0.05)和RV时,PCR是否足以调整群体分层并保持高功率。此外,我们比较了PCA和光谱降维(SDR)这两种降维方法的性能,以及构建PC的十二种不同类型和变体集。比较是在控制大规模人口分层方面进行的。另一方面,线性混合模型(LMM)以其在处理复杂人口结构方面的优越性能而出现。在此,我们基于概率PCA的公式研究了PCR和LMM之间的联系和区别,并提出了将两者结合的混合方法。通过使用遗传分析工作坊(GAW)18数据和1000个基因组项目数据进行的模拟,证明了它在解决人口结构和环境混杂因素方面的出色表现。最后,我们考虑增强对多性状的关联分析的能力。一类新的测试,功率分数测试(SPU)的总和和自适应SPU(aSPU)测试被扩展到广义估计方程(GEE)框架。我们将新方法和现有方法应用于具有HIV / AIDS数据集和GAW 18数据的CV和RV的关联测试。

著录项

  • 作者

    Zhang, Yiwei.;

  • 作者单位

    University of Minnesota.;

  • 授予单位 University of Minnesota.;
  • 学科 Biology Biostatistics.;Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 168 p.
  • 总页数 168
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号