首页> 外文学位 >Statistical Analysis of Haplotypes, Untyped SNPs, and CNVs in Genome-Wide Association Studies.
【24h】

Statistical Analysis of Haplotypes, Untyped SNPs, and CNVs in Genome-Wide Association Studies.

机译:全基因组关联研究中单倍型,未分型的SNP和CNV的统计分析。

获取原文
获取原文并翻译 | 示例

摘要

Missing data arise in genetic association studies when one is interested in assessing the effects of haplotypes, untyped single nucleotide polymorphisms (SNPs) or copy number variants (CNVs). Haplotypes are combinations of nucleotides at multiple loci along individual homologous chromosomes, and the use of haplotypes tends to yield more efficient analysis of disease association than SNPs. Untyped SNPs are SNPs that are not on the genotyping chips used in the study (i.e., missing on all study subjects), and the analysis of untyped SNPs can facilitate localization of disease-causing variants and permit meta-analysis of association studies with different genotyping platforms. A CNV refers to the duplication or deletion of a segment of DNA sequence compared to a reference genome assembly, and can play a causal role in genetic diseases.;In the first part of the proposal, we provide a general likelihood-based framework for making inference on the effects of haplotypes or untyped SNPs and their interactions with environmental variables. Unlike most of the existing methods, we allow genetic and environmental variables to be correlated. We show that the maximum likelihood estimators are consistent, asymptotically normal, and asymptotically efficient and we develop EM algorithms to implement the corresponding inference procedures. We conduct extensive simulation studies and apply the methods to a genome-wide association study (GWAS) of lung cancer.;In the second part, we focus on comparing two approaches in the analysis of untyped SNPs. The maximum likelihood approach integrates prediction of untyped genotypes and estimation of association parameters into a single framework and yields consistent and efficient estimators of genetic effects and gene-environment interactions with proper variance estimators. The imputation approach is a two-stage strategy which first imputes the untyped genotypes by either the most likely genotypes or the expected genotype counts and then uses the imputed values in downstream association analysis. We conduct extensive simulation studies to compare the bias, type I error, power, and confidence interval coverage between the two methods under various situations. In addition, we provide an illustration with genome-wide data from the Wellcome Trust Case-Control Consortium (WTCCC).;In the third part, we present a general framework for the integrated analysis of CNVs and SNPs in association studies, including the analysis of total copy number as a special case. We use allele-specific copy numbers (ASCNs) to describe both the copy number and allelic variations of a locus. The joint effects of CNVs and SNPs on the disease are formulated in terms of allele-specific copy numbers (ASCNs). Our approach combines the ASCN calling and association analysis into a single step while allowing for differential errors. We construct likelihood functions that properly account for the case-control sampling and measurement errors. We establish the asymptotic properties of the maximum likelihood estimators and develop EM algorithms to implement the proposed inference procedures. The advantages of the proposed methods over the existing ones are demonstrated through realistic simulation studies and an application to a GWAS of schizophrenia.
机译:当人们有兴趣评估单倍型,未分型的单核苷酸多态性(SNP)或拷贝数变异(CNV)的影响时,遗传关联研究中就会缺少数据。单倍型是沿着单个同源染色体在多个基因座处的核苷酸的组合,单倍型的使用往往比SNP更有效地分析疾病相关性。未分型的SNP是未在研究中使用的基因分型芯片上的SNP(即,所有研究对象均缺失),并且未分型的SNP的分析可以促进致病变异的定位,并允许对具有不同基因型的关联研究进行荟萃分析平台。 CNV是指与参考基因组装配相比,DNA序列片段的重复或缺失,并且可以在遗传疾病中起因果作用。;在提案的第一部分中,我们提供了一个基于一般似然性的框架推断单倍型或未分型SNP的作用及其与环境变量的相互作用。与大多数现有方法不同,我们允许将遗传变量和环境变量关联起来。我们表明最大似然估计是一致的,渐近正态的和渐近有效的,并且我们开发了EM算法来实现相应的推理过程。我们进行了广泛的模拟研究,并将这些方法应用于肺癌的全基因组关联研究(GWAS)。在第二部分中,我们着重比较两种分析未分型SNP的方法。最大似然方法将无类型基因型的预测和关联参数的估计整合到一个框架中,并使用适当的方差估计量得出一致且有效的遗传效应和基因-环境相互作用的估计量。估算方法是一种两阶段策略,它首先根据最可能的基因型或预期的基因型计数来估算未分型的基因型,然后在下游关联分析中使用估算的值。我们进行了广泛的仿真研究,比较了两种方法在各种情况下的偏差,I型误差,功效和置信区间。此外,我们还提供了来自惠康信托案例控制协会(WTCCC)的全基因组数据的例证。第三部分,我们提供了在关联研究中对CNV和SNP进行整合分析的通用框架,包括分析特殊情况下的总副本数。我们使用等位基因特异性拷贝数(ASCN)来描述基因座的拷贝数和等位基因变异。 CNV和SNP对疾病的联合作用是根据等位基因特异性拷贝数(ASCN)确定的。我们的方法将ASCN调用和关联分析合并为一个步骤,同时允许出现差错。我们构造似然函数,以适当考虑案例控制抽样和测量误差。我们建立最大似然估计器的渐近性质,并开发EM算法来实现所提出的推理程序。通过现实的模拟研究和将其应用于精神分裂症的GWAS,证明了所提出方法相对于现有方法的优势。

著录项

  • 作者

    Hu, Yijuan.;

  • 作者单位

    The University of North Carolina at Chapel Hill.;

  • 授予单位 The University of North Carolina at Chapel Hill.;
  • 学科 Biology Genetics.;Statistics.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 147 p.
  • 总页数 147
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:44:00

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号