...
首页> 外文期刊>BMC Bioinformatics >Sealer: a scalable gap-closing application for finishing draft genomes
【24h】

Sealer: a scalable gap-closing application for finishing draft genomes

机译:Sealer:可扩展的缺口闭合应用程序,用于完成基因组草图

获取原文
           

摘要

Background While next-generation sequencing technologies have made sequencing genomes faster and more affordable, deciphering the complete genome sequence of an organism remains a significant bioinformatics challenge, especially for large genomes. Low sequence coverage, repetitive elements and short read length make de novo genome assembly difficult, often resulting in sequence and/or fragment “gaps” – uncharacterized nucleotide (N) stretches of unknown or estimated lengths. Some of these gaps can be closed by re-processing latent information in the raw reads. Even though there are several tools for closing gaps, they do not easily scale up to processing billion base pair genomes. Results Here we describe Sealer, a tool designed to close gaps within assembly scaffolds by navigating de Bruijn graphs represented by space-efficient Bloom filter data structures. We demonstrate how it scales to successfully close 50.8?% and 13.8?% of gaps in human (3 Gbp) and white spruce (20 Gbp) draft assemblies in under 30 and 27?h, respectively – a feat that is not possible with other leading tools with the breadth of data used in our study. Conclusion Sealer is an automated finishing application that uses the succinct Bloom filter representation of a de Bruijn graph to close gaps in draft assemblies, including that of very large genomes. We expect Sealer to have broad utility for finishing genomes across the tree of life, from bacterial genomes to large plant genomes and beyond. Sealer is available for download at https://github.com/bcgsc/abyss/tree/sealer-release .
机译:背景技术尽管下一代测序技术使基因组测序更快,更实惠,但是解密生物的完整基因组序列仍然是重大的生物信息学挑战,尤其是对于大型基因组。低的序列覆盖率,重复的元件和短的阅读长度使从头基因组组装困难,常常导致序列和/或片段“缺口”-未知或估计长度的未表征的核苷酸(N)延伸。通过重新处理原始读取中的潜在信息,可以弥补其中的一些缺口。即使有多种工具可以缩小缺口,但它们也不容易扩展到处理数十亿个碱基对的基因组。结果在这里,我们描述了Sealer,该工具旨在通过导航由节省空间的Bloom过滤器数据结构表示的de Bruijn图来缩小装配支架中的间隙。我们演示了如何缩放以分别在不到30和27?h的时间内成功弥合人类(3 Gbp)和白云杉(20 Gbp)草稿集会中的50.8%和13.8%的差距–这是其他国家不可能做到的领先的工具,并在我们的研究中使用了广泛的数据。结论Sealer是一个自动整理应用程序,它使用de Bruijn图的简洁Bloom过滤器表示法来缩小草图装配中的间隙,包括非常大的基因组的间隙。我们希望Sealer在完成整个生命树中的基因组方面具有广泛的用途,从细菌基因组到大型植物基因组等等。 Sealer可从https://github.com/bcgsc/abyss/tree/sealer-release下载。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号