首页> 外文学位 >Haplotype inference using a hidden Markov model with efficient Markov chain sampling.
【24h】

Haplotype inference using a hidden Markov model with efficient Markov chain sampling.

机译:使用具有有效马尔可夫链采样的隐藏马尔可夫模型的单倍型推断。

获取原文
获取原文并翻译 | 示例

摘要

Knowledge of haplotypes is useful for understanding block structures of the genome and finding genes associated with disease. Direct measurement of haplotypes in the absence of family data is presently impractical. Hence several methods have been developed previously for reconstructing haplotypes from population data. In this thesis, a new population-based method is developed using a Hidden Markov Model (HMM) for the source of ancestral haplotype segments. A higher-order Markov model is used to account for linkage disequilibrium in the ancestral haplotypes. The HMM includes parameters for the genotyping error rate, the mutation rate, and the recombination rate. Four mutation models with varying number of parameters are developed and compared. Parameters of the model are inferred by Bayesian methods, using Markov Chain Monte Carlo (MCMC). Crucial to the efficiency of the Markov chain sampling is the use of a Forward-Backward algorithm for summing over all possible state sequences of the HMM. This model is tested by reconstructing the haplotypes of 129 children in the data set of Daly et al. (2001) and of 30 children in the CEU and YRI data of the HAPMAP project. For these data sets, family-based haplotype reconstructions found using MERLIN (Abecasis et al. 2002) are used to check the correctness of the population-based reconstructions. The results of this HMM method are quite close to the family-based reconstructions and comparable to the PHASE program (Stephens et al. 2001, Stephens and Donnelly 2003, Stephens and Scheet 2005) and the fastPHASE program (Scheet and Stephens 2006). The recombination rates inferred from this HMM method can help to predict haplotype block boundaries, and identify recombination hotspots.
机译:单倍型的知识对于理解基因组的区块结构和发现与疾病相关的基因很有用。在没有家庭数据的情况下直接测量单倍型目前是不切实际的。因此,先前已经开发了几种用于从种群数据重建单倍型的方法。本文采用隐马尔可夫模型(HMM)为祖先单倍型片段的来源开发了一种新的基于种群的方法。高阶马尔可夫模型用于解释祖先单体型中的连锁不平衡。 HMM包括用于基因分型错误率,突变率和重组率的参数。开发并比较了具有不同数量参数的四个突变模型。使用Markov Chain Monte Carlo(MCMC)通过贝叶斯方法推断模型的参数。马尔可夫链采样效率的关键是使用前向后向算法对HMM的所有可能状态序列求和。通过在Daly等人的数据集中重建129个孩子的单倍型来测试该模型。 (2001)和HAPMAP项目的CEU和YRI数据中的30个孩子。对于这些数据集,使用MERLIN(Abecasis et al。2002)发现的基于家庭的单倍型重建用于检查基于人口的重建的正确性。这种HMM方法的结果与基于家庭的重构非常接近,并且可以与PHASE程序(Stephens等,2001; Stephens和Donnelly 2003; Stephens和Scheet,2005)以及fastPHASE程序(Scheet和Stephens,2006)相媲美。从这种HMM方法推断出的重组率可以帮助预测单元型模块边界,并确定重组热点。

著录项

  • 作者

    Sun, Shuying.;

  • 作者单位

    University of Toronto (Canada).;

  • 授予单位 University of Toronto (Canada).;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2007
  • 页码 143 p.
  • 总页数 143
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 统计学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号