【24h】

MULKSG: MULtiple K Simultaneous Graph Assembly

机译:MULKSG:MULtiple K同时图装配

获取原文

摘要

This work shows how to parallelize multi K de Bruijn graph genome assembly simultaneously, removing the bottleneck of iterative multi K assembly. The expected execution time on a single node with 40 cores is variable, with the average execution time for the entire pipeline over 16 datasets tested being 1613 s for SPAdes vs. 1581 s for MULKSG, with the MULKSG graph creation and traversal averaging 15% faster than SPAdes. We implement a multi-node implementation for the graph creation and traversal portions of the assembly, showing the speedups in Fig. 4. We show that when implemented correctly with correction phases performed per graph in parallel, the expected outcome is very close to the original method, in some cases having less errors while keeping the same NGA50 and genome coverage %, We show this works in practice, implementing with the popular genome assembler SPAdes. Further, this algorithmic change gets rid of the single node sequential bottleneck on multi K genome assembly, allowing for the use of parallel error correction, graph building, graph correction, and graph traversal. We implement a parallel version of the assembly and show the statistics are the same as when run on a single node. The code is open source and can be found at https://github.com/cwright7101 /mulksg.
机译:这项工作说明了如何同时并行化多K de Bruijn图基因组装配,从而消除了迭代多K装配的瓶颈。在具有40个内核的单个节点上的预期执行时间是可变的,测试的16个数据集上整个管道的平均执行时间是SPAdes为1613 s,MULKSG为1581 s,MULKSG图形的创建和遍历平均速度提高了15%比黑桃我们为装配体的图形创建和遍历部分实现了多节点实现,显示了图4中的加速。我们显示出,如果正确地实现了并行执行每个图形的校正阶段,则预期结果将非常接近原始结果。在某些情况下,在保持相同的NGA50和基因组覆盖率%的情况下,具有较小的错误,我们将在实践中展示此方法,并使用流行的基因组组装器SPAdes进行实施。此外,此算法更改摆脱了多K基因组装配上的单节点顺序瓶颈,从而允许使用并行错误校正,图构建,图校正和图遍历。我们实现了程序集的并行版本,并显示统计信息与在单个节点上运行时的统计信息相同。该代码是开源的,可以在https://github.com/cwright7101 / mulksg中找到。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号