Hybrid Communication with TCA and InfiniBand on a Parallel Programming Language XcalableACC for GPU Clusters

机译：在GPU集群的并行编程语言XcalableACC上与TCA和InfiniBand进行混合通信

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

For the execution of parallel HPC applications on GPU-ready clusters, high communication latency between GPUs over nodes will be a serious problem on strong scalability. To reduce the communication latency between GPUs, we proposed the Tightly Coupled Accelerator (TCA) architecture and developed the PEACH2 board as a proof-of-concept interconnection system for TCA. Although PEACH2 provides very low communication latency, there are some hardware limitations due to its implementation depending on PCIe technology, such as the practical number of nodes in a system which is 16 currently named sub-cluster. More number of nodes should be connected by conventional interconnections such as InfiniBand, and the entire network system is configured as a hybrid one with global conventional network and local high-speed network by PEACH2. For ease of user programmability, it is desirable to operate such a complicated communication system at the library or language level (which hides the system). In this paper, we develop a hybrid interconnection network system combining PEACH2 and InfiniBand, and implement it based on a high-level PGAS language for accelerated clusters named XcalableACC (XACC). A preliminary performance evaluation confirms that the hybrid network improves the performance based on the Himeno benchmark for stencil computation by up to 40%, relative to MVAPICH2 with GDR on InfiniBand. Additionally, Allgather collective communication with a hybrid network improves the performance by up to 50% for networks of 8 to 16 nodes. The combination of local communication, supported by the low latency of PEACH2 and global communication supported by the high bandwidth and scalability of InfiniBand, results in an improvement of overall performance.

机译：对于在GPU就绪群集中执行并行HPC应用程序，GPU之间的高通信延迟将是强可扩展性的严重问题。为了降低GPU之间的通信延迟，我们提出了紧密耦合的加速器（TCA）架构，并将PEACH2板开发为TCA的概念校样互连系统。虽然PEACH2提供了非常低的通信延迟，但由于其实现，根据PCIe技术，诸如系统中的系统中的实用数量，存在一些硬件限制，例如作为当前命名的子集群的16个。应通过诸如InfiniBand的传统互连连接更多的节点，并且整个网络系统被配置为具有全局传统网络和PEACH2的局部高速网络的混合动力器。为了便于用户可编程性，期望在库或语言级别（其隐藏系统）进行这样的复杂通信系统。在本文中，我们开发了一个混合互连网络系统，组合PEACH2和INFINIBAND，并基于IT的高级PGA语言来实现名为XcalableACC（XACC）的加速群集。初步性能评估证实，混合网基于MVAPICH2与Infiniband的GDR的MVAPICH2提高了MINENO基准的性能。此外，所有与混合网络的集体集体通信可将性能提高8到16个节点的网络的50％。通过PEACH2的低延迟和由Infiniband的高带宽和可扩展性支持的PEECH2和全局通信支持的组合，导致整体性能的提高。

著录项

来源
《IEEE International Conference on Cluster Computing》|2015年|627-634|共8页
会议地点
作者
Odajima Tetsuya; Boku Taisuke; Hanawa Toshihiro; Murai Hitoshi; Nakao Masahiro; Tabuchi Akihiro; Sato Mitsuhisa;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
graphics processing units; high level languages; parallel programming; GPU cluster; Himeno benchmark; InfiniBand; PCIe technology; PEACH2; TCA architecture; TCA interconnection system; XcalableACC; XcalableACC language; graphics processing unit; high performance computing; high-level PGAS language; parallel HPC application; parallel programming language; tightly coupled accelerator; user programmability; Arrays; Communication systems; Electronics packaging; Graphics processing units; Programming; Scalability; Accelerator; GPU Cluster; Interconnect; PGAS Language; Tightly Coupled Accelerators; XcalableACC;

机译：图形处理单元;高级语言;并行编程; GPU集群; Himeno基准测试; InfiniBand; PCIe技术; PEACH2; TCA架构; TCA互连系统; XcalableACC; XcalableACC语言;图形处理单元;高性能计算;高级PGAS语言;并行HPC应用程序;并行编程语言;紧密耦合的加速器;用户可编程性;阵列;通信系统;电子封装;图形处理单元;编程;可扩展性;加速器; GPU集群;互连; PGAS语言;紧密耦合的加速器; XcalableACC;

相似文献

外文文献
中文文献
专利

1. Evaluation of XcalableACC with tightly coupled accelerators/InfiniBand hybrid communication on accelerated cluster [J] . Nakao Masahiro, Odajima Tetsuya, Murai Hitoshi, International Journal of High Performance Computing Applications . 2019,第5期

机译：在加速集群上使用紧密耦合的加速器/ InfiniBand混合通信对XcalableACC进行评估
2. Dynamic Work Load Balancing for Compute Intensive Application Using Parallel and Hybrid Programming Models on CPU-GPU Cluster [J] . Chandrashekhar B. N, Sanjay H. A Journal of computational and theoretical nanoscience . 2018,第6a7期

机译：CPU-GPU集群中的并行和混合编程模型的计算密集型应用程序动态工作负载平衡
3. Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters [J] . Yang C.-T., Huang C.-L., Lin C.-F. Computer physics communications . 2011,第1期

机译：多核GPU集群上的混合CUDA，OpenMP和MPI并行编程
4. Hybrid Communication with TCA and InfiniBand on a Parallel Programming Language XcalableACC for GPU Clusters [C] . Odajima Tetsuya, Boku Taisuke, Hanawa Toshihiro, IEEE International Conference on Cluster Computing . 2015

机译：与TCA和Infiniband的混合通信在并行编程语言XcalableACC中用于GPU集群
5. Framework for parallelization of programs on GPUs. [D] . P. Kumar, Raghu Raj. 2016

机译：GPU上程序并行化的框架。
6. Novel Hybrid GPU–CPU Implementation of Parallelized Monte Carlo Parametric Expectation Maximization Estimation Method for Population Pharmacokinetic Data Analysis [O] . C. M. Ng 2013

机译：人口药代动力学数据分析的并行蒙特卡洛参数期望最大化估计的新型混合GPU-CPU实现
7. Evaluation of XcalableACC with tightly coupled accelerators/InfiniBand hybrid communication on accelerated cluster [O] . Masahiro Nakao, Tetsuya Odajima, Hitoshi Murai, 2019

机译：XcalableACC与紧密耦合加速器/ Infiniband混合通信的评估

Hybrid Communication with TCA and InfiniBand on a Parallel Programming Language XcalableACC for GPU Clusters

摘要

著录项

相似文献

相关主题

期刊订阅