首页> 外文期刊>Parallel Computing >Multi-level parallelism for incompressible flow computations on GPU clusters
【24h】

Multi-level parallelism for incompressible flow computations on GPU clusters

机译:用于GPU集群上不可压缩流计算的多级并行性

获取原文
获取原文并翻译 | 示例

摘要

We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA parallel implementations, in which all computations are done on the GPU using CUDA. We explore efficiency and scalability of incompressible flow computations using up to 256 CPUs on a problem with approximately 17.2 billion cells. Our work addresses some of the unique issues faced when merging fine-grain parallelism on the GPU using CUDA with coarse-grain parallelism that use either MPI or MPI-OpenMP for communications. We present three different strategies to overlap computations with communications, and systematically assess their impact on parallel performance on two different GPU clusters. Our results for strong and weak scaling analysis of incompressible flow computations demonstrate that GPU clusters offer significant benefits for large data sets, and a dual-level MPI-CUDA implementation with maximum overlapping of computation and communication provides substantial benefits in performance. We also find that our tri-level MPI-OpenMP-CUDA parallel implementation does not offer a significant advantage in performance over the dual-level implementation on GPU clusters with two GPUs per node, but on clusters with higher GPU counts per node or with different domain decomposition strategies a tri-level implementation may exhibit higher efficiency than a dual-level implementation and needs to be investigated further.
机译:我们使用MPI-CUDA和混合MPI-OpenMP-CUDA并行实现研究GPU集群上的多级并行性,其中所有计算都是使用CUDA在GPU上完成的。我们针对大约172亿个单元的问题,探索了使用多达256个CPU的不可压缩流计算的效率和可伸缩性。我们的工作解决了将使用CUDA的GPU上的细粒度并行性与使用MPI或MPI-OpenMP进行通信的粗粒度并行性合并时遇到的一些独特问题。我们提出了三种不同的策略来使计算与通信重叠,并系统地评估它们对两个不同GPU集群上的并行性能的影响。我们对不可压缩流计算的强和弱缩放分析的结果表明,GPU群集为大型数据集提供了显着的优势,而具有最大程度的计算和通信重叠的双层MPI-CUDA实现则为性能带来了显着优势。我们还发现,在每个节点具有两个GPU的GPU群集上,但是在每个节点具有更高GPU计数或具有不同GPU数量的群集上,我们的三级MPI-OpenMP-CUDA并行实现并未提供比双级实现更高的性能优势。域分解策略三层实现可能比二层实现具有更高的效率,需要进一步研究。

著录项

  • 来源
    《Parallel Computing》 |2013年第1期|1-20|共20页
  • 作者单位

    Department of Computer Science, Boise State University, Boise, ID 83725, United States;

    Department of Mechanical and Biomedical Engineering, Boise State University, Boise, ID 83725, United States;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    GPU; Hybrid MPI-OpenMP-CUDA; fluid dynamics;

    机译:GPU;混合MPI-OpenMP-CUDA;流体动力学;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号