GPU-Accelerated Large-Scale Genome Assembly

机译：GPU加速的大规模基因组组装

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Spurred by a widening gap between hardware accelerators and traditional processors, numerous bioinformatics applications have harnessed the computing power of GPUs and reported substantial performance improvements compared to their CPU-based counterparts. However, most of these GPU-based applications only focus on the read alignment problem, while the field of de novo assembly still relies mostly on CPU-based solutions. This is primarily due to the nature of the assembly workload which is not only compute-intensive but also extremely data-intensive. Such workloads require large memories, making it difficult to adapt them to use GPUs with their limited memory capacities. To the best of our knowledge, no GPU-based assembler reported in the recent literature has attempted to assemble datasets larger than a few tens of gigabytes, whereas real sequence datasets are often several hundreds of gigabytes in size. In this paper, we present a new GPU-accelerated genome assembler called LaSAGNA, which can assemble large-scale sequence datasets using a single GPU by building string graphs from approximate all-pair overlaps. LaSAGNA can also run on multiple GPUs across multiple compute nodes connected by a high-speed network to expedite the assembly process. To utilize the limited memory on GPUs efficiently, LaSAGNA uses a semi-streaming approach that makes at most a logarithmic number of passes over the input data based on the available memory. Moreover, we propose a two-level streaming model, from disk to host memory and from host memory to device memory, to minimize disk I/O. Using LaSAGNA, we can assemble a 400 GB human genome dataset on a single NVIDIA K40 GPU in 17 hours, and in a little over 5 hours on an 8-node cluster of NVIDIA K20s.

机译：在硬件加速器和传统处理器之间不断扩大的差距的刺激下，许多生物信息学应用程序已经利用了GPU的计算能力，并且与基于CPU的同类应用程序相比，其性能得到了显着改善。但是，大多数这些基于GPU的应用程序仅关注读取对齐问题，而从头组装领域仍然主要依赖于基于CPU的解决方案。这主要是由于组装工作负载的性质，该工作负载不仅需要大量计算，而且还需要大量数据。此类工作负载需要大容量内存，因此很难使其适应内存容量有限的GPU的使用。据我们所知，最近文献中没有报道过基于GPU的汇编器试图汇编大于几十GB的数据集，而实际序列数据集的大小通常为数百GB。在本文中，我们提出了一种称为LaSAGNA的新型GPU加速基因组组装器，该组装器可以使用单个GPU通过从近似全对重叠构建字符串图来使用单个GPU组装大规模序列数据集。 LaSAGNA还可以在由高速网络连接的多个计算节点上的多个GPU上运行，以加快组装过程。为了有效地利用GPU上有限的内存，LaSAGNA使用半流方法，该方法基于可用内存最多对输入数据进行对数遍历。此外，我们提出了一个从磁盘到主机内存以及从主机内存到设备内存的两级流模型，以最大程度地减少磁盘I / O。使用LaSAGNA，我们可以在一个小时内在一个NVIDIA K40 GPU上组装一个400 GB的人类基因组数据集，而在一个8节点的NVIDIA K20s集群上仅需5个多小时即可完成组装。

著录项

来源
《IEEE International Parallel and Distributed Processing Symposium》|2018年|814-824|共11页
会议地点
作者
Sayan Goswami; Kisung Lee; Shayan Shams; Seung-Jong Park;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Bioinformatics; Genomics; Graphics processing units; Memory management; Tools; Sequential analysis; Computational modeling;

机译：生物信息学;基因组学;图形处理单元;内存管理;工具;顺序分析;计算建模;

相似文献

外文文献
中文文献
专利

1. FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations [J] . Andrew J. Schroeder, David B. Emmert, Gilberto dos?Santos, Nucleic acids research . 2015,第D1期

机译：FlyBase：果蝇果蝇第6版参考基因组组装的介绍和基因组注释的大规模迁移
2. Large-Scale Metagenome Assembly Reveals Novel Animal-Associated Microbial Genomes, Biosynthetic Gene Clusters, and Other Genetic Diversity [J] . Nicholas D. Youngblut, Jacobo de la Cuesta-Zuluaga, Georg H. Reischer, mSystems . 2020,第6期

机译：大规模的梅塔群组件揭示了新型动物相关的微生物基因组，生物合成基因簇和其他遗传多样性
3. Programmed chromosome fission and fusion enable precise large-scale genome rearrangement and assembly [J] . Wang Kaihang, de la Torre Daniel, Robertson Wesley E., Science . 2019,第6456期

机译：程序化的染色体裂变和融合可实现精确的大规模基因组重排和组装
4. GPU-Accelerated Large-Scale Genome Assembly [C] . Sayan Goswami, Kisung Lee, Shayan Shams, IEEE International Parallel and Distributed Processing Symposium . 2018

机译：GPU加速大规模基因组组装
5. Large-scale microarray data analysis using GPU-accelerated linear algebra libraries. [D] . Zhang, Yun. 2012

机译：使用GPU加速的线性代数库进行大规模微阵列数据分析。
6. FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations [O] . Gilberto dos Santos, Andrew J. Schroeder, Joshua L. Goodman, 2015

机译：FlyBase：果蝇果蝇第6版参考基因组组装的介绍和基因组注释的大规模迁移
7. A Robust Method for Finding the Automated Best Matched Genes Based on Grouping Similar Fragments of Large-Scale References for Genome Assembly [O] . Jaehee Jung, Jong Im Kim, Young-Sik Jeong, 2017

机译：基于基因组装配大规模参考片段的相似片段寻找最佳匹配自动基因的稳健方法

GPU-Accelerated Large-Scale Genome Assembly

摘要

著录项

相似文献

相关主题

期刊订阅