Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

Julien C. Thibault; Inanc Senocak

首页> 外文期刊>The Journal of Supercomputing >Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

【24h】

Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

机译：在小尺寸多GPU平台上使用Pthreads-CUDA实现来加速不可压缩的流量计算

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel “co-processors” to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier–Stokes solver for multi-GPU workstation platforms. A shared-memory parallel code with identical numerical methods is also developed for multi-core CPUs to provide a fair comparison between CPUs and GPUs. Specifically, we adopt NVIDIA’s Compute Unified Device Architecture (CUDA) programming model to implement the discretized form of the governing equations on a single GPU. Pthreads are then used to enable communication across multiple GPUs on a workstation. We use separate CUDA kernels to implement the projection algorithm to solve the incompressible fluid flow equations. Kernels are implemented on different memory spaces on the GPU depending on their arithmetic intensity. The memory hierarchy specific implementation produces significantly faster performance. We present a systematic analysis of speedup and scaling using two generations of NVIDIA GPU architectures and provide a comparison of single and double precision computational performance on the GPU. Using a quad-GPU platform for single precision computations, we observe two orders of magnitude speedup relative to a serial CPU implementation. Our results demonstrate that multi-GPU workstations can serve as a cost-effective small-footprint parallel computing platform to accelerate computational fluid dynamics (CFD) simulations substantially.

机译：最初设计用于图形渲染的图形处理器单元（GPU）已经成为与中央处理器（CPU）大规模并行的“协处理器”。具有数百个处理元素的小尺寸多GPU工作站可以大大加速计算密集型仿真科学应用程序。在本研究中，我们描述了用于多GPU工作站平台的不可压缩流Navier-Stokes解算器的实现。还为多核CPU开发了具有相同数值方法的共享内存并行代码，以提供CPU和GPU之间的公平比较。具体来说，我们采用NVIDIA的Compute Unified Device Architecture（CUDA）编程模型在单个GPU上实现离散化的控制方程式。然后使用Pthread在工作站上的多个GPU之间启用通信。我们使用单独的CUDA内核来实现投影算法，以求解不可压缩的流体流动方程。内核根据其算术强度在不同的存储空间上实现。特定于存储器层次结构的实现可显着提高性能。我们使用两代NVIDIA GPU架构对加速和扩展进行了系统分析，并提供了GPU上单精度和双精度计算性能的比较。使用四GPU平台进行单精度计算，相对于串行CPU实现，我们观察到两个数量级的加速。我们的结果表明，多GPU工作站可以用作经济高效的小尺寸并行计算平台，从而大大加速计算流体动力学（CFD）仿真。

著录项

来源
《The Journal of Supercomputing》 |2012年第2期|p.693-719|共27页
作者
Julien C. Thibault; Inanc Senocak;
展开▼
作者单位

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms [J] . Julien C. Thibault, Inane Senocak Journal of supercomputing . 2012,第2期

机译：在小尺寸多GPU平台上使用Pthreads-CUDA实现来加速不可压缩的流量计算
2. Multi-GPU parallel computation of unsteady incompressible flows using kinetically reduced local Navier-Stokes equations [J] . Hashimoto T., Yasuda T., Tanno I, Computers & Fluids . 2018,第期

机译：使用动力学缩小的本地Navier-Stokes方程的多GPU并行计算不稳定的不可压缩流量
3. Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster [J] . Wang Xian, Aoki Takayuki Parallel Computing . 2011,第9期

机译：基于格集群的格子Boltzmann方法不可压缩流计算的多GPU性能
4. An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters [C] . Dana A. Jacobsen, Julien C. Thibault, Inane Senocak AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition . 2010

机译：用于多GPU集群的大规模平行不可压缩流量计算的MPI-CUDA实现
5. Multiscale Modeling and Computation of 3D Incompressible Turbulent Flows. [D] . Hu, Xin. 2012

机译：3D不可压缩湍流的多尺度建模和计算。
6. A multi-GPU accelerated virtual-reality interaction simulation framework [O] . Xuqiang Shao, Weifeng Xu, Lina Lin, 2012

机译：多GPU加速的虚拟现实交互仿真框架
7. CUDA implementation of a navier-stokes solver on multi-gpu desktop platforms for incompressible flows [O] . Julien C. Thibault, Inanc Senocak 2014

机译：针对不可压缩流的多gpu桌面平台上的navier-stokes求解器的CUDa实现
8. Viscous Incompressible Flow Computations for 3-D Steady and Unsteady Flows [R] . Kwak, D. 2001

机译：三维稳态和非稳态流动的粘性不可压缩流动计算

Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅