Design considerations for GPU-aware collective communications in MPI

Iman Faraji; Ahmad Afsahi

首页> 外文期刊>Concurrency and computation: practice and experience >Design considerations for GPU-aware collective communications in MPI

【24h】

Design considerations for GPU-aware collective communications in MPI

机译：MPI中可识别GPU的集体通信的设计注意事项

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

GPU accelerators have established themselves in the state-of-the-art clusters by offering high performance and energy efficiency. In such systems, efficient inter-process GPU communication is of paramount importance to application performance. This paper investigates various algorithms in conjunctionwith the latestGPUfeatures to improveGPUcollective operations. First,we propose a GPU Shared Buffer-aware (GSB) algorithm and a Binomial Tree Based (BTB) algorithm for GPU collectives on single-GPU nodes. We then propose a hierarchical framework for clusters with multi-GPU nodes. By studying various combinations of algorithms, we highlight the importance of choosing the right algorithmwithin each level. The evaluation of our framework on MPI_Allreduce shows promising performance results for large message sizes. To address the shortcoming for small and medium messages, we present the benefit of using the Hyper-Q feature and the MPS service in jointly using CUDA IPC and host-staged copy types to perform multiple inter-process communications. However, we argue that efficient designs are still required to further harness this potential. Accordingly, we propose a static and a dynamic algorithm for MPI_Allgather and MPI_Allreduce and present their effectiveness on various message sizes. Our profiling results indicate that the achieved performance is mainly rooted in overlapping different copy types.

机译：GPU加速器通过提供高性能和能效，已在最先进的集群中建立了自己的地位。在这样的系统中，有效的进程间GPU通信对于应用程序性能至关重要。本文结合最新的GPU功能研究了各种算法，以改善GPU的整体操作。首先，我们针对单GPU节点上的GPU集合提出了GPU共享缓冲区感知（GSB）算法和基于二叉树（BTB）的算法。然后，我们为具有多GPU节点的群集提出了一个分层框架。通过研究各种算法组合，我们强调了在每个级别中选择正确算法的重要性。在MPI_Allreduce上对我们的框架进行的评估显示，对于大消息大小，性能结果很有希望。为了解决中小型消息的缺点，我们展示了结合使用CUDA IPC和主机阶段的副本类型来执行多个进程间通信时，使用Hyper-Q功能和MPS服务的好处。但是，我们认为仍然需要有效的设计来进一步利用这种潜力。因此，我们为MPI_Allgather和MPI_Allreduce提出了一种静态和动态算法，并给出了它们在各种消息大小上的有效性。我们的分析结果表明，获得的性能主要源于重叠的不同副本类型。

著录项

来源
《Concurrency and computation: practice and experience 》 |2018年第17期| e4667.1-e4667.24| 共24页
作者
Iman Faraji; Ahmad Afsahi;
展开▼
作者单位

Department of Electrical and Computer Engineering, Queen's University, Kingston, ONK7L 3N6, Canada;

Department of Electrical and Computer Engineering, Queen's University, Kingston, ONK7L 3N6, Canada;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
collectives; GPU; hierarchical framework; inter-process communications; MPI; MPS;

机译：集体GPU;等级框架;进程间通信;MPI;MPS;

相似文献

外文文献
中文文献
专利

1. GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation [J] . Wang H., Potluri S., Bureddy D., Parallel and Distributed Systems, IEEE Transactions on . 2014 ,第10期

机译：支持RDMA的群集上具有GPU意识的MPI：设计，实施和评估
2. The design of ultra scalable MPI collective communication on the K computer [J] . Tomoya Adachi, Naoyuki Shida, Kenichi Miura, Computer science . 2013 ,第2a3期

机译：K计算机上超可扩展MPI集体通信的设计
3. The design of ultra scalable MPI collective communication on the K computer [J] . Tomoya Adachi, Naoyuki Shida, Kenichi Miura, Computer Science - Research and Development . 2013 ,第2a3期

机译：K计算机上超可扩展MPI集体通信的设计
4. GPU-aware Communication with UCX in Parallel Programming Models: Charm++, MPI, and Python [C] . Jaemin Choi, Zane Fink, Sam White, IEEE International Parallel and Distributed Processing Symposium Workshops . 2021

机译：GPU感知与UCX的并行编程模型中的通信：Charm ++，MPI和Python
5. High-Performance Communication in MPI through Message Matching and Neighborhood Collective Design [D] . Ghazimirsaeed, Seyedeh Mahdieh 2019

机译：通过消息匹配和邻域集体设计实现MPI中的高性能通信
6. Vehicle-to-Pedestrian Communication for Vulnerable Road Users: Survey Design Considerations and Challenges [O] . Parag Sewalkar, Jochen Seitz 2019

机译：面向弱势道路用户的车对行通讯：调查设计注意事项和挑战
7. Static/Dynamic Validation of MPI Collective Communications in Multi-threaded Context [O] . Saillard, Emmanuelle, Carribault Cea, Patrick, Barthou, Denis 2015

机译：多线程上下文中MPI集合通信的静态/动态验证
8. C-Band Airport Surface Communications System Standards Development. Phase II Final Report. Volume 1: Concepts of Use, Initial System Requirements, Architecture, and AeroMACS Design Considerations [R] . Hall, E., Isaacs, J., Henriksen, S., 2011

机译：C-Band机场地面通信系统标准制定。第二阶段最终报告。第1卷：使用概念，初始系统要求，架构和aeromaCs设计注意事项

Design considerations for GPU-aware collective communications in MPI

摘要

著录项

相似文献

相关主题

期刊订阅