Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications

机译：在启用GPU的流应用程序中对基于硬件多播的广播的高效可靠性支持

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Streaming applications, which are data-intensive, have been extensively run on High-Performance Computing (HPC) systems to seek the higher performance and scalability. These applications typically utilize broadcast operations to disseminate in real-time data from a single source to multiple workers, each being a multi-GPU based computing site. State-of-the-art broadcast operations take advantage of InfiniBand (IB) hardware multicast (MCAST) and NVIDIA GPUDirect features to boost inter-node communications performance and scalability. The IB MCAST feature works only with the IB Unreliable Datagram (UD) mechanism and consequently provides unreliable communication for applications. Higher-level libraries and/or runtime environments must handle and provide reliability explicitly. However, handling reliability at that level can be a performance bottleneck for streaming applications. In this paper, we analyze the specific requirements of streaming applications and the performance bottlenecks involved in handling reliability. We show that the traditional Negative-Acknowledgement (NACK) based approach requires the broadcast sender to perform retransmissions for lost packets, degrading streaming throughput. To alleviate this issue, we propose a novel Remote Memory Access (RMA) based scheme to provide high-performance reliability support at the MPI-level. In the proposed scheme, the receivers themselves (as opposed to the sender) retrieve lost packets through RMA operations. Furthermore, we provide an analytical model to illustrate the memory requirements of the proposed RMA-based scheme. Our experimental results show that the proposed scheme introduces nearly no overhead compared to the existing solutions. In a micro-benchmark with injected failures (to simulate unreliable network environments), the proposed scheme shows up to 45% reduction in latency compared to the existing NACK-based scheme. Moreover, with a synthetic streaming benchmark, our design also shows up to a 56% higher broadcast rate compared to the traditional NACK-based scheme on a GPU-dense Cray CS-Storm system with up to 88 NVIDIA K80 GPU cards.

机译：数据密集型流应用程序已在高性能计算（HPC）系统上广泛运行，以寻求更高的性能和可伸缩性。这些应用程序通常利用广播操作将实时数据从单一来源传播到多个工作人员，每个工作人员都是基于多GPU的计算站点。最新的广播操作利用InfiniBand（IB）硬件多播（MCAST）和NVIDIA GPUDirect功能来提高节点间通信的性能和可伸缩性。 IB MCAST功能仅与IB不可靠数据报（UD）机制一起使用，因此为应用程序提供了不可靠的通信。更高级别的库和/或运行时环境必须明确处理并提供可靠性。但是，在该级别上处理可靠性可能是流应用程序的性能瓶颈。在本文中，我们分析了流应用程序的特定要求以及处理可靠性时涉及的性能瓶颈。我们表明，传统的基于否定确认（NACK）的方法要求广播发送方对丢失的数据包执行重传，从而降低了流吞吐量。为了缓解此问题，我们提出了一种新颖的基于远程内存访问（RMA）的方案，以在MPI级别提供高性能的可靠性支持。在提出的方案中，接收者本身（与发送者相对）通过RMA操作检索丢失的数据包。此外，我们提供了一个分析模型来说明所提出的基于RMA的方案的内存要求。我们的实验结果表明，与现有解决方案相比，该方案几乎没有引入任何开销。在具有注入故障的微基准测试中（模拟不可靠的网络环境），与现有的基于NACK的方案相比，所提出的方案可将延迟降低多达45％。此外，在具有多达88个NVIDIA K80 GPU卡的GPU密集型Cray CS-Storm系统上，通过合成流基准测试，我们的设计还显示出比传统的基于NACK的方案高出56％的广播速率。

著录项

来源
《2016 First Workshop on Optimization of Communication in HPC》|2016年|29-38|共10页
会议地点 Salt Lake City(US)
作者
C.-H. Chu; K. Hamidouche; H. Subramoni; A. Venkatesh; B. Elton; D. K. Panda;
展开▼
作者单位

Engility Corp., Dayton, OH, USA;

Engility Corp., Dayton, OH, USA;

Engility Corp., Dayton, OH, USA;

Engility Corp., Dayton, OH, USA;

Engility Corp., Dayton, OH, USA;

Engility Corp., Dayton, OH, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Hardware; Receivers; Graphics processing units; Scalability; Libraries; Reliability engineering;

机译：硬件;接收器;图形处理单元;可扩展性;库;可靠性工程;

相似文献

外文文献
中文文献
专利

1. Benefits of Adding Hardware Support for Broadcast and Reduce Operations in MPSoC Applications [J] . YUANXI PENG, MANUEL SALDANA, CHRISTOPHER A. MADILL, ACM transactions on reconfigurable technology and systems . 2014,第3期

机译：增加对广播的硬件支持并减少MPSoC应用程序中的操作的好处
2. Exploring Efficient Hardware Support for Applications with Irregular Memory Patterns on Multinode Manycore Architectures [J] . Marco Ceriani, Simone Secchi, Oreste Villa, IEEE Transactions on Parallel and Distributed Systems . 2017,第6期

机译：探索对多节点Manycore架构上具有不规则内存模式的应用程序的有效硬件支持
3. Efficient communication support in predictable heterogeneous MPSoC designs for streaming applications [J] . Yifan He, Dongrui She, Sander Stuijk, Journal of systems architecture . 2013,第10aPta1期

机译：针对流应用的可预测的异构MPSoC设计中的有效通信支持
4. Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications [C] . C.-H. Chu, K. Hamidouche, H. Subramoni, International Conference for High Performance Computing, Networking, Storage and Analysis . 2016

机译：高效的可靠性支持在支持GPU的流式流应用中的基于硬件组播的广播
5. Efficient broadcast of XML streams in a push based environment. [D] . Chaluvadi, Vamsi Krishna. 2003

机译：在基于推送的环境中有效广播XML流。
6. The Stream Algorithm: Computationally Efficient Ridge-Regression via Bayesian Model Averaging and Applications to Pharmacogenomic Prediction of Cancer Cell Line Sensitivity [O] . Elias Chaibub Neto, In Sock Jang, Stephen H. Friend, -1

机译：流算法：通过贝叶斯模型平均计算有效的岭回归及其在癌细胞系敏感性药物基因组学预测中的应用
7. On Supporting Power-Efficient Streaming Applications in Wireless Environments [O] . Hao Zhu, Guohong Cao 2005

机译：在无线环境中支持高能效流媒体应用

Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications

摘要

著录项

相似文献

相关主题

期刊订阅