首页> 外文期刊>Genome research >An MCMC algorithm for haplotype assembly from whole-genome sequence data.
【24h】

An MCMC algorithm for haplotype assembly from whole-genome sequence data.

机译:用于从全基因组序列数据中进行单倍型装配的MCMC算法。

获取原文
获取原文并翻译 | 示例
           

摘要

In comparison to genotypes, knowledge about haplotypes (the combination of alleles present on a single chromosome) is much more useful for whole-genome association studies and for making inferences about human evolutionary history. Haplotypes are typically inferred from population genotype data using computational methods. Whole-genome sequence data represent a promising resource for constructing haplotypes spanning hundreds of kilobases for an individual. In this article, we propose a Markov chain Monte Carlo (MCMC) algorithm, HASH (haplotype assembly for single human), for assembling haplotypes from sequenced DNA fragments that have been mapped to a reference genome assembly. The transitions of the Markov chain are generated using min-cut computations on graphs derived from the sequenced fragments. We have applied our method to infer haplotypes using whole-genome shotgun sequence data from a recently sequenced human individual. The high sequence coverage and presence of mate pairs result in fairly long haplotypes (N50 length ~ 350 kb). Based on comparison of the sequenced fragments against the individual haplotypes, we demonstrate that the haplotypes for this individual inferred using HASH are significantly more accurate than the haplotypes estimated using a previously proposed greedy heuristic and a simple MCMC method. Using haplotypes from the HapMap project, we estimate the switch error rate of the haplotypes inferred using HASH to be quite low, ~1.1%. Our Markov chain Monte Carlo algorithm represents a general framework for haplotype assembly that can be applied to sequence data generated by other sequencing technologies. The code implementing the methods and the phased individual haplotypes can be downloaded from (http://www.cse.ucsd.edu/users/vibansal/HASH/).
机译:与基因型相比,关于单倍型(存在于单个染色体上的等位基因的组合)的知识对于全基因组关联研究和对人类进化史的推断更有用。通常使用计算方法从群体基因型数据推断单体型。全基因组序列数据代表了构建个人跨越数百千碱基的单倍型的有前途的资源。在本文中,我们提出了一种马尔可夫链蒙特卡罗(MCMC)算法HASH(单人单倍型装配),用于从已映射到参考基因组装配的测序DNA片段中组装单倍型。马尔可夫链的跃迁是使用最小割计算在从序列片段中得出的图上生成的。我们已经应用我们的方法来使用来自最近测序的人类个体的全基因组shot弹枪序列数据来推断单倍型。高序列覆盖率和伴侣对的存在导致相当长的单倍型(N50长度〜350 kb)。基于针对单个单体型的测序片段的比较,我们证明了使用HASH推断的该个体的单体型比使用先前提出的贪婪启发式法和简单的MCMC方法估计的单体型明显更准确。使用HapMap项目中的单倍型,我们估计使用HASH推断出的单倍型的开关错误率非常低,约为1.1%。我们的马尔可夫链蒙特卡罗算法代表了单倍型装配的通用框架,该框架可应用于其他测序技术生成的序列数据。可以从(http://www.cse.ucsd.edu/users/vibansal/HASH/)下载实现该方法和分阶段的单个单元型的代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号