首页> 外文会议>Annual international conference on research in computational molecular biology >WhatsHap: Haplotype Assembly for Future-Generation Sequencing Reads
【24h】

WhatsHap: Haplotype Assembly for Future-Generation Sequencing Reads

机译:WhatsHap:用于下一代测序的单倍型组装阅读

获取原文

摘要

The human genome is diploid, that is each of its chromosomes comes in two copies. This requires to phase the single nucleotide polymorphisms (SNPs), that is, to assign them to the two copies, beyond just detecting them. The resulting haplotypes, lists of SNPs belonging to each copy, are crucial for downstream analyses in population genetics. Currently, statistical approaches, which avoid making use of direct read information, constitute the state-of-the-art. Haplotype assembly, which addresses phasing directly from sequencing reads, suffers from the fact that sequencing reads of the current generation are too short to serve the purposes of genome-wide phasing. Future sequencing technologies, however, bear the promise to generate reads of lengths and error rates that allow to bridge all SNP positions in the genome at sufficient amounts of SNPs per read. Existing haplotype assembly approaches, however, profit precisely, in terms of computational complexity, from the limited length of current-generation reads, because their runtime is usually exponential in the number of SNPs per sequencing read. This implies that such approaches will not be able to exploit the benefits of long enough, future-generation reads. Here, we suggest WhatsHap, a novel dynamic programming approach to haplotype assembly. It is the first approach that yields provably optimal solutions to the weighted minimum error correction (wMEC) problem in runtime linear in the number of SNPs per sequencing read, making it suitable for future-generation reads. WhatsHap is a fixed parameter tractable (FPT) approach with coverage as the parameter. We demonstrate that WhatsHap can handle datasets of coverage up to 20x, processing chromosomes on standard workstations in only 1-2 hours. Our simulation study shows that the quality of haplotypes assembled by WhatsHap significantly improves with increasing read length, both in terms of genome coverage as well as in terms of switch errors. The switch error rates we achieve in our simulations are superior to those obtained by state-of-the-art statistical phasers.
机译:人类基因组是二倍体,即它的每个染色体都有两个副本。这要求对单核苷酸多态性(SNP)进行定相,即将其分配给两个拷贝,而不仅仅是检测它们。产生的单倍型,即属于每个拷贝的SNP列表,对于种群遗传学的下游分析至关重要。当前,避免使用直接阅读信息的统计方法构成了最新技术。单倍型组装直接解决了测序读取中的定相问题,因为当前一代的测序读物太短而无法满足全基因组定相的目的。但是,未来的测序技术有望产生长度和错误率的读数,从而允许以每次读数足够的SNP量连接基因组中的所有SNP位置。但是,由于计算的复杂性,现有单倍型组装方法可从当前读取的有限长度中精确地获利,因为它们的运行时间通常是每个测序读取的SNP数量成指数关系。这意味着这些方法将无法利用足够长的下一代读取的好处。在这里,我们建议使用WhatsHap,这是一种用于单倍型装配的新型动态编程方法。这是第一种方法,它在运行时产生加权最小错误校正(wMEC)问题的可证明的最佳解决方案,该问题在运行时间上与每个测序读取的SNP数量呈线性关系,使其适合于下一代读取。 WhatsHap是一种以覆盖率作为参数的固定参数可处理(FPT)方法。我们证明WhatsHap可以处理覆盖率高达20倍的数据集,仅需1-2小时即可在标准工作站上处理染色体。我们的模拟研究表明,无论是在基因组覆盖率还是在转换错误方面,WhatsHap组装的单倍型的质量都随着阅读长度的增加而显着提高。我们在仿真中实现的开关错误率优于最新的统计相位器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号