An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters

机译：用于多GPU集群的大规模平行不可压缩流量计算的MPI-CUDA实现

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern graphics processing units (GPUs) with many-core architectures have emerged as general-purpose parallel computing platforms that can accelerate simulation science applications tremendously. While multi-GPU workstations with several TeraFLOPS of peak computing power are available to accelerate computational problems, larger problems require even more resources. Conventional clusters of central processing units (CPU) are now being augmented with multiple GPUs in each compute-node to tackle large problems. The heterogeneous architecture of a multi-GPU cluster with a deep memory hierarchy creates unique challenges in developing scalable and efficient simulation codes. In this study, we pursue mixed MPI-CUDA implementations and investigate three strategies to probe the efficiency and scalability of incompressible flow computations on the Lincoln Tesla cluster at the National Center for Supercomputing Applications (NCSA). We exploit some of the advanced features of MPI and CUDA programming to overlap both GPU data transfer and MPI communications with computations on the GPU. We sustain approximately 2.4 TeraFLOPS on the 64 nodes of the NCSA Lincoln Tesla cluster using 128 GPUs with a total of 30,720 processing elements. Our results demonstrate that multi-GPU clusters can substantially accelerate computational fluid dynamics (CFD) simulations.

机译：具有许多核心架构的现代图形处理单元（GPU）已成为通用并行计算平台，可以卓越地加速仿真科学应用程序。虽然具有多个峰值计算能力的多GPU工作站可用于加速计算问题，但更大的问题需要更多的资源。现在在每个计算节点中使用多个GPU来增强中央处理单元（CPU）的传统集群以解决大问题。具有深度存储层级的多GPU群集的异构架构在开发可扩展和高效的仿真代码方面创造了独特的挑战。在这项研究中，我们追求混合的MPI-CUDA实施，并调查三种策略，探讨了全国超级计算申请（NCSA）林肯特斯拉集群上不可压缩流量计算的效率和可扩展性。我们利用MPI和CUDA编程的一些高级功能，将GPU数据传输和MPI通信与GPU的计算重叠。我们使用128个GPU在NCSA LICOLN TESLA集群的64个节点上维持大约2.4 TERAFLOPS，总共30,720个处理元件。我们的结果表明，多GPU集群可以基本上加速计算流体动力学（CFD）模拟。

著录项

来源
《AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition》|2010年||共16页
会议地点
作者
Dana A. Jacobsen; Julien C. Thibault; Inane Senocak;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 V4-532;
关键词
MPI-CUDA; Massively; Computations;

机译：MPI-CODA;大规模;参考;

相似文献

外文文献
中文文献
专利

1. Multi-GPU parallel computation of unsteady incompressible flows using kinetically reduced local Navier-Stokes equations [J] . Hashimoto T., Yasuda T., Tanno I, Computers & Fluids . 2018,第期

机译：使用动力学缩小的本地Navier-Stokes方程的多GPU并行计算不稳定的不可压缩流量
2. A parallel nonlinear multigrid solver for unsteady incompressible flow simulation on multi-GPU cluster [J] . Shi Xiaolei, Agrawal Tanmay, Lin Chao-An, Journal of Computational Physics . 2020,第1期

机译：一种平行的非线性多重求解求解器，用于多GPU簇上的非稳态不可压缩仿真
3. Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms [J] . Julien C. Thibault, Inanc Senocak The Journal of Supercomputing . 2012,第2期

机译：在小尺寸多GPU平台上使用Pthreads-CUDA实现来加速不可压缩的流量计算
4. An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters [C] . Dana A. Jacobsen, Julien C. Thibault, Inane Senocak AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition . 2010

机译：用于多GPU集群的大规模平行不可压缩流量计算的MPI-CUDA实现
5. Computation of unsteady viscous incompressible flow around an obliquely oscillating circular cylinder using a parallelized finite difference algorithm. [D] . Lawrence, Karl P. 2004

机译：使用并行有限差分算法计算倾斜振荡圆柱体周围的非稳态粘性不可压缩流。
6. An adaptable parallel algorithm for the direct numerical simulation of incompressible turbulent flows using a Fourier spectral/hp element method and MPI virtual topologies [O] . A. Bolis, C.D. Cantwell, D. Moxey, -1

机译：使用傅里叶频谱/ hp元素方法和MPI虚拟拓扑对不可压缩湍流进行直接数值模拟的自适应并行算法
7. An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters [O] . Jacobsen Dana A., Thibault Julien C., Senocak Inanc 2010

机译：MPI-CUDA在多GPU群集上大规模并行不可压缩流量计算的实现
8. Stability Analysis of Large-Scale Incompressible Flow Calculations on Massively211 Parallel Computers [R] . Lehoucq, Romero, Salinger 1999

机译：大型211并行计算机大规模不可压缩流量计算的稳定性分析

An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters

摘要

著录项

相似文献

相关主题

期刊订阅