首页> 外文会议>International Conference for High Performance Computing, Networking, Storage and Analysis >Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly
【24h】

Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly

机译:De Novo基因组装配的并行De Bruijn图构建和遍历

获取原文

摘要

De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous fragments called reads. We study optimized parallelization of the most time-consuming phases of Meraculous, a state of-the-art production assembler. First, we present a new parallel algorithm for k-mer analysis, characterized by intensive communication and I/O requirements, and reduce the memory requirements by 6.93×. Second, we efficiently parallelize de Bruijn graph construction and traversal, which necessitates a distributed hash table and is a key component of most de novo assemblers. We provide a novel algorithm that leverages one-sided communication capabilities of the Unified Parallel C (UPC) to facilitate the requisite fine-grained parallelism and avoidance of data hazards, while analytically proving its scalability properties. Overall results show unprecedented performance and efficient scaling on up to 15,360 cores of a Cray XC30, on human genome as well as the challenging wheat genome, with performance improvement from days to seconds.
机译:从头开始的全基因组组装从短小,重叠和潜在错误的片段(称为读段)中重建基因组序列。我们研究了最先进的生产组装商Meraculous最耗时的阶段的优化并行化。首先,我们提出了一种新的用于k-mer分析的并行算法,其特征在于密集的通信和I / O需求,并将内存需求减少了6.93倍。其次,我们有效地并行化了de Bruijn图的构造和遍历,这需要分布式哈希表,并且是大多数de novo汇编程序的关键组成部分。我们提供了一种新颖的算法,该算法利用统一并行C(UPC)的单边通信功能来促进必要的细粒度并行性和避免数据危害,同时通过分析证明其可伸缩性。总体结果表明,在人类基因组以及具有挑战性的小麦基因组上,Cray XC30的多达15360个核具有前所未有的性能和有效的扩展能力,性能从几天到几秒钟都得到了改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号