首页> 外文学位 >Estimation of haplotype frequencies from data on unrelated people.
【24h】

Estimation of haplotype frequencies from data on unrelated people.

机译:根据无关人员的数据估算单倍型频率。

获取原文
获取原文并翻译 | 示例

摘要

The estimation of haplotype frequencies has become important because it has been shown that using haplotypes frequencies instead of individual single nucleotide polymorphisms (SNPs) often provides higher power for genetic association studies (Olson and Wijsman, 1994). Several algorithms or methods have been proposed in the literature (Excoffier and Slatkin, 1995, Hawley and Kidd, 1995, Lin et al., 2002, Niu et al., 2002, Stephens et al., 2001a, Qin et al., 2002) for estimating haplotype frequencies. Some of the most popular methods have been using the expectation-maximization (EM) maximum likelihood (ML) algorithm to obtain the maximum-likelihood estimates, and the Bayesian approach using a coalescent prior, the latter as incorporated in the software PHASE. However, a major drawback of these methods is the number of parameters that have to be estimated and hence the number of loci the algorithms can handle, especially when the number of individuals in the sample is large. Here we propose for case-control data a novel method to estimate haplotype frequencies, called the limited linkage disequilibrium (LLD) algorithm that requires the estimation of many fewer parameters, and hence can accommodate a larger number of loci. Haplotypes that are found to be significantly associated with disease in this way can then be further studied with a view to finding disease-causing genetic variants.; We first estimate the allele frequencies, then the linkage disequilibrium (LD) coefficients for all possible combinations of two loci by an exact estimation procedure on the assumption that the allele frequencies are known. Then we successively estimate all combinations of three- and four-locus linkage disequilibrium coefficients, at each stage fixing the estimates obtained so far. The haplotype frequencies are then expressed in terms of these estimated allele frequencies and linkage disequilibrium coefficients. Because we limit the number of stages, assuming the higher order disequilibrium coefficients are zero, our method estimates the haplotype frequencies as functions of fewer parameters and hence can handle a larger number of loci, even when the sample size is large. The LLD algorithm estimates the haplotype frequencies efficiently with absolute errors of estimates that are minimal. Also, the estimates are virtually unaffected by deviations from Hardy Weinberg Equilibrium even though the method assumes that the Hardy Weinberg Equilibrium holds at the loci.
机译:单倍型频率的估计已变得很重要,因为已经表明,使用单倍型频率代替单个单核苷酸多态性(SNP)通常可以为遗传关联研究提供更高的功能(Olson和Wijsman,1994)。文献中已经提出了几种算法或方法(Excoffier和Slatkin,1995; Hawley和Kidd,1995; Lin等,2002; Niu等,2002; Stephens等,2001a,Qin等,2002 )以估算单倍型频率。一些最流行的方法一直在使用期望最大化(EM)最大似然(ML)算法来获得最大似然估计,以及使用合并先验的贝叶斯方法,后者合并在软件PHASE中。但是,这些方法的主要缺点是必须估计参数的数量,因此算法可以处理的基因座数量,特别是当样本中的个体数量很大时。在这里,我们为病例控制数据提出了一种估计单倍型频率的新方法,称为有限连锁不平衡(LLD)算法,该算法需要估计许多较少的参数,因此可以容纳更多的基因座。然后可以进一步研究发现以这种方式与疾病显着相关的单倍型,以寻找引起疾病的遗传变异。我们首先估计等位基因的频率,然后在假定等位基因频率已知的前提下,通过精确的估计程序估算两个基因座的所有可能组合的连锁不平衡(LD)系数。然后,我们依次估算三位和四位连锁不平衡系数的所有组合,并在每个阶段固定到目前为止获得的估算。然后根据这些估计的等位基因频率和连锁不平衡系数表达单倍型频率。由于我们限制了阶段数,因此假设高阶不平衡系数为零,我们的方法将单倍型频率作为较少参数的函数进行估算,因此即使样本量很大,也可以处理更多的基因座。 LLD算法以最小的估计绝对误差有效地估计单倍型频率。而且,即使该方法假定哈代·温伯格平衡在该位点处成立,估计值实际上也不会受到与哈迪·温伯格平衡的偏差的影响。

著录项

  • 作者

    Sinha, Moumita.;

  • 作者单位

    Case Western Reserve University.;

  • 授予单位 Case Western Reserve University.;
  • 学科 Biology Biostatistics.; Biology Genetics.; Statistics.
  • 学位 Ph.D.
  • 年度 2007
  • 页码 198 p.
  • 总页数 198
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物数学方法;遗传学;统计学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号