MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters

Hao Wang; Sreeram Potluri; Miao Luo; Ashish Kumar Singh; Sayantan Sur; Dhabaleswar K. Panda

首页> 外文期刊>Computer science >MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters

【24h】

MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters

机译：MVAPICH2-GPU：针对InfiniBand集群优化了GPU与GPU的通信

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data parallel architectures, such as General Purpose Graphics Units (GPGPUs) have seen a tremendous rise in their application for High End Computing. However, data movement in and out of GPGPUs remain the biggest hurdle to overall performance and programmer productivity. Applications executing on a cluster with GPUs have to manage data movement using CUDA in addition to MPI, the de-facto parallel programming standard. Currently, data movement with CUDA and MPI libraries is not integrated and it is not as efficient as possible. In addition, MPI-2 one sided communication does not work for windows in GPU memory, as there is no way to remotely get or put data from GPU memory in a one-sided manner. In this paper, we propose a novel MPI design that integrates CUDA data movement transparently with MPI. The programmer is presented with one MPI interface that can communicate to and from GPUs. Data movement from GPU and network can now be overlapped. The proposed design is incorporated into the MVAPICH2 library. To the best of our knowledge, this is the first work of its kind to enable advanced MPI features and optimized pipelining in a widely used MPI library. We observe up to 45% improvement in one-way latency. In addition, we show that collective communication performance can be improved significantly: 32%, 37% and 30% improvement for Scatter, Gather and Allotall collective operations, respectively. Further, we enable MPI-2 one sided communication with GPUs. We observe up to 45% improvement for Put and Get operations.

机译：诸如通用图形单元（GPGPU）之类的数据并行体系结构在高端计算中的应用得到了极大的发展。但是，进出GPGPU的数据仍然是整体性能和程序员生产力的最大障碍。在带有GPU的集群上执行的应用程序除了必须使用MPI（事实上的并行编程标准）之外，还必须使用CUDA来管理数据移动。当前，尚未将具有CUDA和MPI库的数据移动集成在一起，并且效率不高。此外，MPI-2单面通信不适用于GPU内存中的窗口，因为无法单面远程获取或放置来自GPU内存的数据。在本文中，我们提出了一种新颖的MPI设计，该设计将CUDA数据移动与MPI透明集成。为程序员提供了一个MPI接口，该接口可以与GPU通信。来自GPU和网络的数据移动现在可以重叠。拟议的设计已合并到MVAPICH2库中。据我们所知，这是在广泛使用的MPI库中启用高级MPI功能和优化流水线的同类工作。我们观察到单向延迟最多可提高45％。此外，我们证明集体通信性能可以显着提高：Scatter，Gather和Allotall集体操作分别提高了32％，37％和30％。此外，我们启用MPI-2与GPU的单面通信。我们发现Put和Get操作最多可提高45％。

著录项

来源
《Computer science》 |2011年第4期|p.257-266|共10页
作者
Hao Wang; Sreeram Potluri; Miao Luo; Ashish Kumar Singh; Sayantan Sur; Dhabaleswar K. Panda;
展开▼
作者单位

Department of Computer Science and Engineering, The Ohio State University, Columbus, USA;

Department of Computer Science and Engineering, The Ohio State University, Columbus, USA;

Department of Computer Science and Engineering, The Ohio State University, Columbus, USA;

Department of Computer Science and Engineering, The Ohio State University, Columbus, USA;

Department of Computer Science and Engineering, The Ohio State University, Columbus, USA;

Department of Computer Science and Engineering, The Ohio State University, Columbus, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
MPI; clusters; GPGPU; CUDA; infiniband;

机译：MPI;集群GPGPU;CUDA;无限带宽;
入库时间 2022-08-17 13:50:21

相似文献

外文文献
中文文献
专利

1. GPUDirect Async: Exploring GPU synchronous communication techniques for InfiniBand clusters [J] . E. Agostini, D. Rossetti, S. Potluri Journal of Parallel and Distributed Computing . 2018,第APRa期

机译：GPUDirect异步：探索适用于InfiniBand集群的GPU同步通信技术
2. The development of Mellanox/NVIDIA GPUDirect over InfiniBand-a new model for GPU to GPU communications [J] . Gilad Shainer, Ali Ayoub, Pak Lui, Computer science . 2011,第3a4期

机译：基于InfiniBand的Mellanox / NVIDIA GPUDirect的开发-一种GPU与GPU之间通信的新模型
3. InfiniBand Verbs on GPU: a case study of controlling an InfiniBand network device from the GPU [J] . Oden Lena, Froening Holger Experimental Mechanics . 2017,第4期

机译：GPU上的InfiniBand动词：从GPU控制InfiniBand网络设备的案例研究
4. Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs [C] . Potluri Sreeram, Hamidouche Khaled, Venkatesh Akshay, International Conference on Parallel Processing . 2013

机译：使用GPUDirect RDMA的高效节点间MPI通信，用于带有NVIDIA GPU的InfiniBand群集
5. Automatic transformation and optimization of applications on GPUs and GPU clusters. [D] . Ma, Wenjing. 2011

机译：在GPU和GPU群集上自动转换和优化应用程序。
6. Next-generation acceleration and code optimization for light transport in turbid media using GPUs [O] . Erik Alerstam, William Chun Yip Lo, Tianyi David Han, 2010

机译：下一代加速和代码优化使用GPU在混浊的介质中传输
7. Large-Scale Spatial Data Processing on GPUs and GPU-Accelerated Clusters [O] . Jianting Zhang, Simin You, Le Gruenwald 2015

机译：GpU和GpU加速集群上的大规模空间数据处理

MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters

摘要

著录项

相似文献

相关主题

期刊订阅