Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters

Jianqi Lai; Hang Yu; Zhengyu Tian; Hua Li

首页> 外文期刊>Scientific programming >Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters

【24h】

Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters

机译：用于多GPU HPC集群的CFD应用的混合MPI和CUDA并行化

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graphics processing units (GPUs) have a strong floating-point capability and a high memory bandwidth in data parallelism and have been widely used in high-performance computing (HPC). Compute unified device architecture (CUDA) is used as a parallel computing platform and programming model for the GPU to reduce the complexity of programming. The programmable GPUs are becoming popular in computational fluid dynamics (CFD) applications. In this work, we propose a hybrid parallel algorithm of the message passing interface and CUDA for CFD applications on multi-GPU HPC clusters. The AUSM?+?UP upwind scheme and the three-step Runge–Kutta method are used for spatial discretization and time discretization, respectively. The turbulent solution is solved by the K?ω SST two-equation model. The CPU only manages the execution of the GPU and communication, and the GPU is responsible for data processing. Parallel execution and memory access optimizations are used to optimize the GPU-based CFD codes. We propose a nonblocking communication method to fully overlap GPU computing, CPU_CPU communication, and CPU_GPU data transfer by creating two CUDA streams. Furthermore, the one-dimensional domain decomposition method is used to balance the workload among GPUs. Finally, we evaluate the hybrid parallel algorithm with the compressible turbulent flow over a flat plate. The performance of a single GPU implementation and the scalability of multi-GPU clusters are discussed. Performance measurements show that multi-GPU parallelization can achieve a speedup of more than 36 times with respect to CPU-based parallel computing, and the parallel algorithm has good scalability.

机译：图形处理单元（GPU）具有强大的浮点功能和数据并行性的高存储器带宽，并已广泛用于高性能计算（HPC）。计算统一设备架构（CUDA）用作GPU的并行计算平台和编程模型，以降低编程的复杂性。可编程GPU在计算流体动力学（CFD）应用中遭受。在这项工作中，我们提出了一种在多GPU HPC集群上的CFD应用程序传递接口和CUDA的混合并行算法。 AUSM？+？上行方案和三步runge-Kutta方法分别用于空间离散化和时间离散化。湍流溶液通过kωsst两方程模型求解。 CPU仅管理GPU和通信的执行，并且GPU负责数据处理。并行执行和内存访问优化用于优化基于GPU的CFD代码。我们提出了一种非阻塞通信方法来通过创建两个CUDA流来完全重叠GPU计算，CPU_CPU通信和CPU_GPU数据传输。此外，一维域分解方法用于平衡GPU之间的工作量。最后，我们用平板上的可压缩湍流进行混合并联算法。讨论了单个GPU实现的性能和多GPU集群的可扩展性。性能测量结果表明，多GPU并行化可以实现相对于基于CPU的并行计算超过36次的加速，并行算法具有良好的可扩展性。

著录项

来源
《Scientific programming》 |2020年第3期|共15页
作者
Jianqi Lai; Hang Yu; Zhengyu Tian; Hua Li;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters [J] . Yang C.-T., Huang C.-L., Lin C.-F. Computer physics communications . 2011,第1期

机译：多核GPU集群上的混合CUDA，OpenMP和MPI并行编程
2. Experiences Using Hybrid MPI/OpenMP in the Real World: Parallelization of a 3D CFD Solver for Multi-Core Node Clusters [J] . GabrieleJost, BobRobins Scientific programming . 2010,第3a4期

机译：在现实世界中使用混合MPI / OpenMP的经验：针对多核节点群集的3D CFD解算器并行化
3. Experiences using hybrid MPI/OpenMP in the real world: Parallelization of a 3D CFD solver for multi-core node clusters [J] . Gabriele Jost, Bob Robins Scientific programming . 2010,第3a4期

机译：在现实世界中使用混合MPI / OpenMP的经验：针对多核节点群集的3D CFD求解器并行化
4. An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters [C] . Dana A. Jacobsen, Julien C. Thibault, Inane Senocak AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition . 2010

机译：用于多GPU集群的大规模平行不可压缩流量计算的MPI-CUDA实现
5. An MPI-CUDA implementation of a model for calcium induced calcium release in a three-dimensional heart cell on a hybrid CPU/GPU cluster [D] . Huang, Xuan 2015

机译：MPI-CUDA模型在混合CPU / GPU集群上的三维心脏细胞中钙诱导的钙释放的模型实现
6. DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI [O] . Yongchao Liu, Bertil Schmidt, Douglas L Maskell 2011

机译：DecGPU：使用CUDA和MPI在大规模并行图形处理单元上进行分布式错误纠正
7. An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters [O] . Jacobsen Dana A., Thibault Julien C., Senocak Inanc 2010

机译：MPI-CUDA在多GPU群集上大规模并行不可压缩流量计算的实现

Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters

摘要

著录项

相似文献

相关主题

期刊订阅