...
首页> 外文期刊>Concurrency and computation: practice and experience >Design considerations for GPU-aware collective communications in MPI
【24h】

Design considerations for GPU-aware collective communications in MPI

机译:MPI中可识别GPU的集体通信的设计注意事项

获取原文
获取原文并翻译 | 示例

摘要

GPU accelerators have established themselves in the state-of-the-art clusters by offering high performance and energy efficiency. In such systems, efficient inter-process GPU communication is of paramount importance to application performance. This paper investigates various algorithms in conjunctionwith the latestGPUfeatures to improveGPUcollective operations. First,we propose a GPU Shared Buffer-aware (GSB) algorithm and a Binomial Tree Based (BTB) algorithm for GPU collectives on single-GPU nodes. We then propose a hierarchical framework for clusters with multi-GPU nodes. By studying various combinations of algorithms, we highlight the importance of choosing the right algorithmwithin each level. The evaluation of our framework on MPI_Allreduce shows promising performance results for large message sizes. To address the shortcoming for small and medium messages, we present the benefit of using the Hyper-Q feature and the MPS service in jointly using CUDA IPC and host-staged copy types to perform multiple inter-process communications. However, we argue that efficient designs are still required to further harness this potential. Accordingly, we propose a static and a dynamic algorithm for MPI_Allgather and MPI_Allreduce and present their effectiveness on various message sizes. Our profiling results indicate that the achieved performance is mainly rooted in overlapping different copy types.
机译:GPU加速器通过提供高性能和能效,已在最先进的集群中建立了自己的地位。在这样的系统中,有效的进程间GPU通信对于应用程序性能至关重要。本文结合最新的GPU功能研究了各种算法,以改善GPU的整体操作。首先,我们针对单GPU节点上的GPU集合提出了GPU共享缓冲区感知(GSB)算法和基于二叉树(BTB)的算法。然后,我们为具有多GPU节点的群集提出了一个分层框架。通过研究各种算法组合,我们强调了在每个级别中选择正确算法的重要性。在MPI_Allreduce上对我们的框架进行的评估显示,对于大消息大小,性能结果很有希望。为了解决中小型消息的缺点,我们展示了结合使用CUDA IPC和主机阶段的副本类型来执行多个进程间通信时,使用Hyper-Q功能和MPS服务的好处。但是,我们认为仍然需要有效的设计来进一步利用这种潜力。因此,我们为MPI_Allgather和MPI_Allreduce提出了一种静态和动态算法,并给出了它们在各种消息大小上的有效性。我们的分析结果表明,获得的性能主要源于重叠的不同副本类型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号