首页> 美国卫生研究院文献>Bioinformatics >Genotype calling from next-generation sequencing data using haplotype information of reads
【2h】

Genotype calling from next-generation sequencing data using haplotype information of reads

机译:使用读取的单倍型信息从下一代测序数据进行基因型调用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

>Motivation: Low coverage sequencing provides an economic strategy for whole genome sequencing. When sequencing a set of individuals, genotype calling can be challenging due to low sequencing coverage. Linkage disequilibrium (LD) based refinement of genotyping calling is essential to improve the accuracy. Current LD-based methods use read counts or genotype likelihoods at individual potential polymorphic sites (PPSs). Reads that span multiple PPSs (jumping reads) can provide additional haplotype information overlooked by current methods.>Results: In this article, we introduce a new Hidden Markov Model (HMM)-based method that can take into account jumping reads information across adjacent PPSs and implement it in the HapSeq program. Our method extends the HMM in Thunder and explicitly models jumping reads information as emission probabilities conditional on the states of adjacent PPSs. Our simulation results show that, compared to Thunder, HapSeq reduces the genotyping error rate by 30%, from 0.86% to 0.60%. The results from the 1000 Genomes Project show that HapSeq reduces the genotyping error rate by 12 and 9%, from 2.24% and 2.76% to 1.97% and 2.50% for individuals with European and African ancestry, respectively. We expect our program can improve genotyping qualities of the large number of ongoing and planned whole genome sequencing projects.>Contact: ; >Availability: The software package HapSeq and its manual can be found and downloaded at .>Supplementary information: are available at Bioinformatics online.
机译:>动机:低覆盖率测序为全基因组测序提供了一种经济策略。当对一组个体进行测序时,由于测序覆盖率低,基因型调用可能会很困难。基于连锁不平衡(LD)的基因分型方法的提炼对于提高准确性至关重要。当前基于LD的方法在各个潜在的多态性位点(PPS)使用读取计数或基因型可能性。跨越多个PPS的读取(跳跃读取)可以提供当前方法所忽略的其他单倍型信息。>结果:在本文中,我们介绍了一种新的基于隐马尔可夫模型(HMM)的方法,该方法可以考虑跳跃读取相邻PPS之间的信息,并在HapSeq程序中实现它。我们的方法在Thunder中扩展了HMM,并显式地将跳跃读取信息建模为条件,条件是发射概率取决于相邻PPS的状态。我们的仿真结果表明,与Thunder相比,HapSeq将基因分型错误率降低了30%,从0.86%降低到0.60%。 1000个基因组计划的结果表明,HapSeq将具有欧洲和非洲血统的个体的基因分型错误率分别降低了12%和9%,从2.24%和2.76%降低到1.97%和2.50%。我们希望我们的程序可以改善大量正在进行的和计划中的全基因组测序项目的基因分型质量。>联系方式:; >可用性:可以在以下位置找到并下载软件包HapSeq及其手册。>补充信息:可从Bioinformatics在线获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号