首页> 外文会议>IEEE International Conference on Cluster Computing >Hybrid Communication with TCA and InfiniBand on a Parallel Programming Language XcalableACC for GPU Clusters
【24h】

Hybrid Communication with TCA and InfiniBand on a Parallel Programming Language XcalableACC for GPU Clusters

机译:在GPU集群的并行编程语言XcalableACC上与TCA和InfiniBand进行混合通信

获取原文

摘要

For the execution of parallel HPC applications on GPU-ready clusters, high communication latency between GPUs over nodes will be a serious problem on strong scalability. To reduce the communication latency between GPUs, we proposed the Tightly Coupled Accelerator (TCA) architecture and developed the PEACH2 board as a proof-of-concept interconnection system for TCA. Although PEACH2 provides very low communication latency, there are some hardware limitations due to its implementation depending on PCIe technology, such as the practical number of nodes in a system which is 16 currently named sub-cluster. More number of nodes should be connected by conventional interconnections such as InfiniBand, and the entire network system is configured as a hybrid one with global conventional network and local high-speed network by PEACH2. For ease of user programmability, it is desirable to operate such a complicated communication system at the library or language level (which hides the system). In this paper, we develop a hybrid interconnection network system combining PEACH2 and InfiniBand, and implement it based on a high-level PGAS language for accelerated clusters named XcalableACC (XACC). A preliminary performance evaluation confirms that the hybrid network improves the performance based on the Himeno benchmark for stencil computation by up to 40%, relative to MVAPICH2 with GDR on InfiniBand. Additionally, Allgather collective communication with a hybrid network improves the performance by up to 50% for networks of 8 to 16 nodes. The combination of local communication, supported by the low latency of PEACH2 and global communication supported by the high bandwidth and scalability of InfiniBand, results in an improvement of overall performance.
机译:对于在GPU就绪群集中执行并行HPC应用程序,GPU之间的高通信延迟将是强可扩展性的严重问题。为了降低GPU之间的通信延迟,我们提出了紧密耦合的加速器(TCA)架构,并将PEACH2板开发为TCA的概念校样互连系统。虽然PEACH2提供了非常低的通信延迟,但由于其实现,根据PCIe技术,诸如系统中的系统中的实用数量,存在一些硬件限制,例如作为当前命名的子集群的16个。应通过诸如InfiniBand的传统互连连接更多的节点,并且整个网络系统被配置为具有全局传统网络和PEACH2的局部高速网络的混合动力器。为了便于用户可编程性,期望在库或语言级别(其隐藏系统)进行这样的复杂通信系统。在本文中,我们开发了一个混合互连网络系统,组合PEACH2和INFINIBAND,并基于IT的高级PGA语言来实现名为XcalableACC(XACC)的加速群集。初步性能评估证实,混合网基于MVAPICH2与Infiniband的GDR的MVAPICH2提高了MINENO基准的性能。此外,所有与混合网络的集体集体通信可将性能提高8到16个节点的网络的50%。通过PEACH2的低延迟和由Infiniband的高带宽和可扩展性支持的PEECH2和全局通信支持的组合,导致整体性能的提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号