GPU-Accelerated Large-Scale Genome Assembly

机译：GPU加速大规模基因组组装

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Spurred by a widening gap between hardware accelerators and traditional processors, numerous bioinformatics applications have harnessed the computing power of GPUs and reported substantial performance improvements compared to their CPU-based counterparts. However, most of these GPU-based applications only focus on the read alignment problem, while the field of de novo assembly still relies mostly on CPU-based solutions. This is primarily due to the nature of the assembly workload which is not only compute-intensive but also extremely data-intensive. Such workloads require large memories, making it difficult to adapt them to use GPUs with their limited memory capacities. To the best of our knowledge, no GPU-based assembler reported in the recent literature has attempted to assemble datasets larger than a few tens of gigabytes, whereas real sequence datasets are often several hundreds of gigabytes in size. In this paper, we present a new GPU-accelerated genome assembler called LaSAGNA, which can assemble large-scale sequence datasets using a single GPU by building string graphs from approximate all-pair overlaps. LaSAGNA can also run on multiple GPUs across multiple compute nodes connected by a high-speed network to expedite the assembly process. To utilize the limited memory on GPUs efficiently, LaSAGNA uses a semi-streaming approach that makes at most a logarithmic number of passes over the input data based on the available memory. Moreover, we propose a two-level streaming model, from disk to host memory and from host memory to device memory, to minimize disk I/O. Using LaSAGNA, we can assemble a 400 GB human genome dataset on a single NVIDIA K40 GPU in 17 hours, and in a little over 5 hours on an 8-node cluster of NVIDIA K20s.

机译：由硬件加速器和传统处理器之间的差距扩大，许多生物信息学应用已经利用了GPU的计算能力，并报告了与基于CPU的对应物相比的实质性改进。但是，大多数基于GPU的应用程序仅关注读取对齐问题，而De Novo集装的字段仍然依赖于基于CPU的解决方案。这主要是由于装配工作负载的性质，这不仅是计算密集的，而且是极其数据密集型的。此类工作负载需要大存储器，使得它们难以使它们使用GPU具有有限的存储容量。据我们所知，最近的文献中没有报告的基于GPU的汇编器已经尝试组装大于几十几千千兆字节的数据集，而实际序列数据集通常大小的数百千兆字节。在本文中，我们介绍了一种名为LASAGNA的新的GPU加速基因组汇编器，其可以通过从近似全对重叠的弦图来组装单个GPU来组装大规模序列数据集。烤宽面条还可以在多个GPU上运行，跨高速网络连接的多个计算节点，以加快装配过程。为了有效地利用GPU上的有限内存，赖拉纳使用半流式方法，该方法基于可用存储器在输入数据上实现最多的对数。此外，我们提出了一种双层流模型，从磁盘到托管内存和从主机到设备存储器，以最小化磁盘I / O.使用烤宽面条，我们可以在17小时内在单个NVIDIA K40 GPU上组装400 GB人类基因组数据集，在NVIDIA K20S的8节点群集稍微超过5小时。

著录项

来源
《IEEE International Parallel and Distributed Processing Symposium》|2018年|588p|共11页
会议地点
作者
Sayan Goswami; Kisung Lee; Shayan Shams; Seung-Jong Park;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.133;
关键词
Bioinformatics; Genomics; Graphics processing units; Memory management; Tools; Sequential analysis; Computational modeling;

机译：生物信息学;基因组学;图形处理单元;内存管理;工具;顺序分析;计算建模;

相似文献

外文文献
中文文献
专利

1. FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations [J] . Andrew J. Schroeder, David B. Emmert, Gilberto dos?Santos, Nucleic acids research . 2015,第D1期

机译：FlyBase：果蝇果蝇第6版参考基因组组装的介绍和基因组注释的大规模迁移
2. Large-Scale Metagenome Assembly Reveals Novel Animal-Associated Microbial Genomes, Biosynthetic Gene Clusters, and Other Genetic Diversity [J] . Nicholas D. Youngblut, Jacobo de la Cuesta-Zuluaga, Georg H. Reischer, mSystems . 2020,第6期

机译：大规模的梅塔群组件揭示了新型动物相关的微生物基因组，生物合成基因簇和其他遗传多样性
3. Programmed chromosome fission and fusion enable precise large-scale genome rearrangement and assembly [J] . Wang Kaihang, de la Torre Daniel, Robertson Wesley E., Science . 2019,第6456期

机译：程序化的染色体裂变和融合可实现精确的大规模基因组重排和组装
4. GPU-Accelerated Large-Scale Genome Assembly [C] . Sayan Goswami, Kisung Lee, Shayan Shams, IEEE International Parallel and Distributed Processing Symposium . 2018

机译：GPU加速的大规模基因组组装
5. Large-scale microarray data analysis using GPU-accelerated linear algebra libraries. [D] . Zhang, Yun. 2012

机译：使用GPU加速的线性代数库进行大规模微阵列数据分析。
6. FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations [O] . Gilberto dos Santos, Andrew J. Schroeder, Joshua L. Goodman, 2015

机译：FlyBase：果蝇果蝇第6版参考基因组组装的介绍和基因组注释的大规模迁移
7. A Robust Method for Finding the Automated Best Matched Genes Based on Grouping Similar Fragments of Large-Scale References for Genome Assembly [O] . Jaehee Jung, Jong Im Kim, Young-Sik Jeong, 2017

机译：基于基因组装配大规模参考片段的相似片段寻找最佳匹配自动基因的稳健方法

GPU-Accelerated Large-Scale Genome Assembly

摘要

著录项

相似文献

相关主题

期刊订阅