首页> 美国卫生研究院文献>Bioinformatics >Compacting de Bruijn graphs from sequencing data quickly and in low memory
【2h】

Compacting de Bruijn graphs from sequencing data quickly and in low memory

机译:通过低内存快速排序数据来压缩de Bruijn图

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: As the quantity of data per sequencing experiment increases, the challenges of fragment assembly are becoming increasingly computational. The de Bruijn graph is a widely used data structure in fragment assembly algorithms, used to represent the information from a set of reads. Compaction is an important data reduction step in most de Bruijn graph based algorithms where long simple paths are compacted into single vertices. Compaction has recently become the bottleneck in assembly pipelines, and improving its running time and memory usage is an important problem.>Results: We present an algorithm and a tool bcalm 2 for the compaction of de Bruijn graphs. bcalm 2 is a parallel algorithm that distributes the input based on a minimizer hashing technique, allowing for good balance of memory usage throughout its execution. For human sequencing data, bcalm 2 reduces the computational burden of compacting the de Bruijn graph to roughly an hour and 3 GB of memory. We also applied bcalm 2 to the 22 Gbp loblolly pine and 20 Gbp white spruce sequencing datasets. Compacted graphs were constructed from raw reads in less than 2 days and 40 GB of memory on a single machine. Hence, bcalm 2 is at least an order of magnitude more efficient than other available methods.>Availability and Implementation: Source code of bcalm 2 is freely available at: >Contact:
机译:>动机:随着每个测序实验的数据量增加,片段组装的挑战越来越多地需要计算。 de Bruijn图是片段组装算法中广泛使用的数据结构,用于表示一组读取的信息。在大多数基于de Bruijn图的算法中,压缩是重要的数据缩减步骤,在该算法中,将长的简单路径压缩为单个顶点。压缩最近已成为装配流水线的瓶颈,改善其运行时间和内存使用率是一个重要问题。>结果:我们提出了一种用于de Bruijn图压缩的算法和工具bcalm 2。 bcalm 2是一种并行算法,它基于最小化器哈希技术分配输入,从而在执行过程中实现内存使用情况的良好平衡。对于人类测序数据,bcalm 2减少了将de Bruijn图压缩到大约一个小时和3 GB内存的计算负担。我们还将bcalm 2应用于22 Gbp的火炬松和20 Gbp的白云杉测序数据集。压缩图是在不到2天的时间内从原始读取中构建的,并且在一台计算机上具有40 GB的内存。因此,bcalm 2的效率至少比其他可用方法高一个数量级。>可用性和实现:bcalm 2的源代码可从以下地址免费获得:>联系方式:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号