首页> 美国卫生研究院文献>Journal of Computational Biology >Hap-seq: An Optimal Algorithm for Haplotype Phasing with Imputation Using Sequencing Data
【2h】

Hap-seq: An Optimal Algorithm for Haplotype Phasing with Imputation Using Sequencing Data

机译:Hap-seq:使用序列数据进行插补的单倍型定相的最佳算法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Inference of haplotypes, or the sequence of alleles along each chromosome, is a fundamental problem in genetics and is important for many analyses, including admixture mapping, identifying regions of identity by descent, and imputation. Traditionally, haplotypes are inferred from genotype data obtained from microarrays using information on population haplotype frequencies inferred from either a large sample of genotyped individuals or a reference dataset such as the HapMap. Since the availability of large reference datasets, modern approaches for haplotype phasing along these lines are closely related to imputation methods. When applied to data obtained from sequencing studies, a straightforward way to obtain haplotypes is to first infer genotypes from the sequence data and then apply an imputation method. However, this approach does not take into account that alleles on the same sequence read originate from the same chromosome. Haplotype assembly approaches take advantage of this insight and predict haplotypes by assigning the reads to chromosomes in such a way that minimizes the number of conflicts between the reads and the predicted haplotypes. Unfortunately, assembly approaches require very high sequencing coverage and are usually not able to fully reconstruct the haplotypes. In this work, we present a novel approach, Hap-seq, which is simultaneously an imputation and assembly method that combines information from a reference dataset with the information from the reads using a likelihood framework. Our method applies a dynamic programming algorithm to identify the predicted haplotype, which maximizes the joint likelihood of the haplotype with respect to the reference dataset and the haplotype with respect to the observed reads. We show that our method requires only low sequencing coverage and can reconstruct haplotypes containing both common and rare alleles with higher accuracy compared to the state-of-the-art imputation methods.
机译:>单倍型或沿每个染色体的等位基因序列的推论是遗传学中的一个基本问题,对于许多分析都非常重要,包括掺混物作图,通过血统鉴定同一性区域和归因。传统上,使用从大量基因型个体样本或参考数据集(例如HapMap)中推断出的群体单倍型频率信息,从从微阵列获得的基因型数据中推断出单倍型。由于可获得大量参考数据集,因此沿这些路线进行单倍型定相的现代方法与插补方法密切相关。当应用于从测序研究中获得的数据时,获得单倍型的一种直接方法是首先从序列数据中推断出基因型,然后应用推算方法。但是,此方法未考虑相同序列读取的等位基因源自同一染色体。单体型装配方法利用这种见识并通过将读数分配给染色体来预测单体型,从而使读数与预测的单体型之间的冲突数量最小化。不幸的是,组装方法需要很高的测序覆盖率,并且通常不能完全重建单倍型。在这项工作中,我们提出了一种新颖的方法Hap-seq,它同时是一种插补和组装方法,它使用似然框架将参考数据集中的信息与读取数据相结合。我们的方法应用动态编程算法来识别预测的单倍型,从而使单倍型相对于参考数据集和单倍型相对于观察到的读数的联合似然性最大化。结果表明,与现有的插补方法相比,该方法只需要较低的测序覆盖率,并且可以重建包含普通和稀有等位基因的单倍型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号