首页> 外文OA文献 >An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters

【2h】

An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters

机译：MPI-CUDA在多GPU群集上大规模并行不可压缩流量计算的实现

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Modern graphics processing units (GPUs) with many-core architectures have emerged as general-purpose parallel computing platforms that can accelerate simulation science applications tremendously. While multi-GPU workstations with several TeraFLOPS of peak computing power are available to accelerate computational problems, larger problems require even more resources. Conventional clusters of central processing units (CPU) are now being augmented with multiple GPUs in each compute-node to tackle large problems. The heterogeneous architecture of a multi-GPU cluster with a deep memory hierarchy creates unique challenges in developing scalable and efficient simulation codes. In this study, we pursue mixed MPI-CUDA implementations and investigate three strategies to probe the efficiency and scalability of incompressible flow computations on the Lincoln Tesla cluster at the National Center for Supercomputing Applications (NCSA). We exploit some of the advanced features of MPI and CUDA programming to overlap both GPU data transfer and MPI communications with computations on the GPU. We sustain approximately 2.4 TeraFLOPS on the 64 nodes of the NCSA Lincoln Tesla cluster using 128 GPUs with a total of 30,720 processing elements. Our results demonstrate that multi-GPU clusters can substantially accelerate computational fluid dynamics (CFD) simulations.

机译：具有多核体系结构的现代图形处理单元（GPU）已经成为通用并行计算平台，可以极大地加速仿真科学的应用。虽然具有几个峰值运算能力TeraFLOPS的多GPU工作站可用于加速计算问题，但更大的问题需要更多的资源。现在，传统的中央处理器（CPU）集群在每个计算节点中增加了多个GPU，以解决较大的问题。具有深内存层次结构的多GPU群集的异构体系结构在开发可扩展和高效的仿真代码时面临独特的挑战。在本研究中，我们追求混合MPI-CUDA的实现，并研究了三种策略来探索国家超级计算应用中心（NCSA）上林肯·特斯拉集群上不可压缩流计算的效率和可伸缩性。我们利用MPI和CUDA编程的一些高级功能，将GPU数据传输和MPI通信与GPU上的计算重叠。我们使用128个GPU和总共30,720个处理元素在NCSA Lincoln Tesla集群的64个节点上维持大约2.4 TeraFLOPS。我们的结果表明，多GPU群集可以大大加速计算流体动力学（CFD）仿真。

著录项

作者
Jacobsen Dana A.; Thibault Julien C.; Senocak Inanc;
展开▼
作者单位

展开▼
年度 2010
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Multi-GPU parallel computation of unsteady incompressible flows using kinetically reduced local Navier-Stokes equations [J] . Hashimoto T., Yasuda T., Tanno I, Computers & Fluids . 2018,第期

机译：使用动力学缩小的本地Navier-Stokes方程的多GPU并行计算不稳定的不可压缩流量
2. A parallel nonlinear multigrid solver for unsteady incompressible flow simulation on multi-GPU cluster [J] . Shi Xiaolei, Agrawal Tanmay, Lin Chao-An, Journal of Computational Physics . 2020,第1期

机译：一种平行的非线性多重求解求解器，用于多GPU簇上的非稳态不可压缩仿真
3. Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms [J] . Julien C. Thibault, Inanc Senocak The Journal of Supercomputing . 2012,第2期

机译：在小尺寸多GPU平台上使用Pthreads-CUDA实现来加速不可压缩的流量计算
4. An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters [C] . Dana A. Jacobsen, Julien C. Thibault, Inane Senocak AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition . 2010

机译：用于多GPU集群的大规模平行不可压缩流量计算的MPI-CUDA实现
5. Computation of unsteady viscous incompressible flow around an obliquely oscillating circular cylinder using a parallelized finite difference algorithm. [D] . Lawrence, Karl P. 2004

机译：使用并行有限差分算法计算倾斜振荡圆柱体周围的非稳态粘性不可压缩流。
6. An adaptable parallel algorithm for the direct numerical simulation of incompressible turbulent flows using a Fourier spectral/hp element method and MPI virtual topologies [O] . A. Bolis, C.D. Cantwell, D. Moxey, -1

机译：使用傅里叶频谱/ hp元素方法和MPI虚拟拓扑对不可压缩湍流进行直接数值模拟的自适应并行算法
7. Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism [O] . Jacobsen Dana A., Senocak Inanc 2011

机译：使用二级和三级并行度的多GPU群集上不可压缩流量计算的可伸缩性
8. Stability Analysis of Large-Scale Incompressible Flow Calculations on Massively211 Parallel Computers [R] . Lehoucq, Romero, Salinger 1999

机译：大型211并行计算机大规模不可压缩流量计算的稳定性分析

An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅