首页> 外文期刊>The Journal of Supercomputing >Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms
【24h】

Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

机译:在小尺寸多GPU平台上使用Pthreads-CUDA实现来加速不可压缩的流量计算

获取原文
获取原文并翻译 | 示例

摘要

Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel “co-processors” to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier–Stokes solver for multi-GPU workstation platforms. A shared-memory parallel code with identical numerical methods is also developed for multi-core CPUs to provide a fair comparison between CPUs and GPUs. Specifically, we adopt NVIDIA’s Compute Unified Device Architecture (CUDA) programming model to implement the discretized form of the governing equations on a single GPU. Pthreads are then used to enable communication across multiple GPUs on a workstation. We use separate CUDA kernels to implement the projection algorithm to solve the incompressible fluid flow equations. Kernels are implemented on different memory spaces on the GPU depending on their arithmetic intensity. The memory hierarchy specific implementation produces significantly faster performance. We present a systematic analysis of speedup and scaling using two generations of NVIDIA GPU architectures and provide a comparison of single and double precision computational performance on the GPU. Using a quad-GPU platform for single precision computations, we observe two orders of magnitude speedup relative to a serial CPU implementation. Our results demonstrate that multi-GPU workstations can serve as a cost-effective small-footprint parallel computing platform to accelerate computational fluid dynamics (CFD) simulations substantially.
机译:最初设计用于图形渲染的图形处理器单元(GPU)已经成为与中央处理器(CPU)大规模并行的“协处理器”。具有数百个处理元素的小尺寸多GPU工作站可以大大加速计算密集型仿真科学应用程序。在本研究中,我们描述了用于多GPU工作站平台的不可压缩流Navier-Stokes解算器的实现。还为多核CPU开发了具有相同数值方法的共享内存并行代码,以提供CPU和GPU之间的公平比较。具体来说,我们采用NVIDIA的Compute Unified Device Architecture(CUDA)编程模型在单个GPU上实现离散化的控制方程式。然后使用Pthread在工作站上的多个GPU之间启用通信。我们使用单独的CUDA内核来实现投影算法,以求解不可压缩的流体流动方程。内核根据其算术强度在不同的存储空间上实现。特定于存储器层次结构的实现可显着提高性能。我们使用两代NVIDIA GPU架构对加速和扩展进行了系统分析,并提供了GPU上单精度和双精度计算性能的比较。使用四GPU平台进行单精度计算,相对于串行CPU实现,我们观察到两个数量级的加速。我们的结果表明,多GPU工作站可以用作经济高效的小尺寸并行计算平台,从而大大加速计算流体动力学(CFD)仿真。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号