Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast

Chu Ching-Hsiang; Lu Xiaoyi; Awan Ammar A.; Subramoni Hari; Elton Bracy; Panda Dhabaleswar K.

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast

【24h】

Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast

机译：利用硬件多播和GPUDirect RDMA进行高效广播

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Broadcast is a widely used operation in many streaming and deep learning applications to disseminate large amounts of data on emerging heterogeneous High-Performance Computing (HPC) systems. However, traditional broadcast schemes do not fully utilize hardware features for Graphics Processing Unit (GPU)-based applications. In this paper, a model-oriented analysis is presented to identify performance bottlenecks of existing broadcast schemes on GPU clusters. Next, streaming-based broadcast schemes are proposed to exploit InfiniBand hardware multicast (IB-MCAST) and NVIDIA GPUDirect technology for efficient message transmission. The proposed designs are evaluated in the context of using Message Passing Interface (MPI) based benchmarks and applications. The experimental results indicate improved scalability and up to 82 percent reduction of latency compared to the state-of-the-art solutions in the benchmark-level evaluation. Furthermore, compared to the state-of-the-art, the proposed design yields stable higher throughput for a synthetic streaming workload, and 1.3x faster training time for a deep learning framework.

机译：广播是许多流媒体和深度学习应用程序中广泛使用的一种操作，用于在新兴的异构高性能计算（HPC）系统上传播大量数据。但是，传统的广播方案没有完全利用基于图形处理单元（GPU）的应用程序的硬件功能。本文提出了一种面向模型的分析方法，以识别GPU集群上现有广播方案的性能瓶颈。接下来，提出了基于流的广播方案，以利用InfiniBand硬件多播（IB-MCAST）和NVIDIA GPUDirect技术进行有效的消息传输。在使用基于消息传递接口（MPI）的基准和应用程序的上下文中评估了建议的设计。实验结果表明，与基准级别评估中的最新解决方案相比，可伸缩性得到了改善，延迟减少了多达82％。此外，与最新技术相比，拟议的设计可为合成流工作负载提供稳定的更高吞吐量，而深度学习框架的培训时间则可提高1.3倍。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2019年第3期|575-588|共14页
作者
Chu Ching-Hsiang; Lu Xiaoyi; Awan Ammar A.; Subramoni Hari; Elton Bracy; Panda Dhabaleswar K.;
展开▼
作者单位

Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA;

Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA;

Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA;

Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA;

Engility Corp, Dayton, OH 45433 USA;

Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Broadcast; deep learning; hardware multicast; GPU; GPUDirect RDMA; heterogeneous broadcast; streaming;

机译：广播;深度学习;硬件组播;GPU;GPUDirect RDMA;异构广播;流;

相似文献

外文文献
中文文献
专利

1. A Survey of Emerging Interconnects for On-Chip Efficient Multicast and Broadcast in Many-Cores [J] . Karkar Ammar, Mak Terrence, Tong Kin-Fai, Circuits and Systems Magazine, IEEE . 2016,第1期

机译：新兴互连技术的多核片上高效组播和广播研究
2. Energy- and latency-efficient broadcastingmechanism supporting long-term evolution e-multimedia broadcast/multicast service transmission [J] . Shin H.-Y. Communications, IET . 2013,第15期

机译：高能效和时延的广播机制，支持长期演进电子多媒体广播/组播服务传输
3. Single-carrier frequency domain equalisation with hierarchical constellations: an efficient transmission technique for broadcast and multicast systems [J] . Silva J., Dinis R., Souto N., Communications, IET . 2012,第13期

机译：具有分层星座的单载波频域均衡：广播和多播系统的有效传输技术
4. A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters [C] . Venkatesh A., Subramoni H., Hamidouche K., International Conference on High Performance Computing . 2014

机译：具有硬件多播和GPUDirect RDMA的高性能广播设计，适用于Infiniband集群上的流应用程序
5. Energy -efficient broadcast and multicast algorithms in wireless ad hoc networks [D] . Guo, Song 2006

机译：无线ad hoc网络中的节能广播和多播算法
6. Energy-Efficient Multicast Service Delivery Exploiting Single Frequency Device-To-Device Communications in 5G New Radio Systems [O] . Sara Pizzi, Federica Rinaldi, Antonella Molinaro, 2018

机译：在5G新型无线电系统中开发单频设备间通信的高能效组播服务交付
7. Exploiting RDMA operations for Providing Efficient Fine-Grained Resource Monitoring in Cluster-based Servers [O] . K. Vaidyanathan, H. -w. Jin, D. K. Panda 2006

机译：利用RDma操作在基于群集的服务器中提供高效的细粒度资源监控
8. Novel Power-Efficient Broadcast Routing Algorithm Exploiting Broadcast Efficiency [R] . Kang, I. , Poovendran, R. 2003

机译：一种利用广播效率的新型节能广播路由算法

Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅