Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications

机译：高效的可靠性支持在支持GPU的流式流应用中的基于硬件组播的广播

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Streaming applications, which are data-intensive, have been extensively run on High-Performance Computing (HPC) systems to seek the higher performance and scalability. These applications typically utilize broadcast operations to disseminate in real-time data from a single source to multiple workers, each being a multi-GPU based computing site. State-of-the-art broadcast operations take advantage of InfiniBand (IB) hardware multicast (MCAST) and NVIDIA GPUDirect features to boost inter-node communications performance and scalability. The IB MCAST feature works only with the IB Unreliable Datagram (UD) mechanism and consequently provides unreliable communication for applications. Higher-level libraries and/or runtime environments must handle and provide reliability explicitly. However, handling reliability at that level can be a performance bottleneck for streaming applications. In this paper, we analyze the specific requirements of streaming applications and the performance bottlenecks involved in handling reliability. We show that the traditional Negative-Acknowledgement (NACK) based approach requires the broadcast sender to perform retransmissions for lost packets, degrading streaming throughput. To alleviate this issue, we propose a novel Remote Memory Access (RMA) based scheme to provide high-performance reliability support at the MPI-level. In the proposed scheme, the receivers themselves (as opposed to the sender) retrieve lost packets through RMA operations. Furthermore, we provide an analytical model to illustrate the memory requirements of the proposed RMA-based scheme. Our experimental results show that the proposed scheme introduces nearly no overhead compared to the existing solutions. In a micro-benchmark with injected failures (to simulate unreliable network environments), the proposed scheme shows up to 45% reduction in latency compared to the existing NACK-based scheme. Moreover, with a synthetic streaming benchmark, our design also shows up to a 56% higher broadcast rate compared to the traditional NACK-based scheme on a GPU-dense Cray CS-Storm system with up to 88 NVIDIA K80 GPU cards.

机译：具有数据密集型的流应用已经广泛运行高性能计算（HPC）系统，以寻求更高的性能和可扩展性。这些应用程序通常利用广播操作从单个源传播到多个工人的实时数据，每个基于多GPU的计算站点。国家的最先进的广播业务利用的InfiniBand（IB）硬件组播（MCAST）和NVIDIA GPUDirect特性来提高节点间的通信性能和可扩展性。 IB Mcast功能仅适用于IB不可靠的数据报（UD）机制，从而为应用程序提供不可靠的通信。更高级别的库和/或运行时环境必须明确地处理并提供可靠性。但是，处理该级别的可靠性可以是流媒体应用程序的性能瓶颈。在本文中，我们分析了流媒体应用的具体要求和处理可靠性所涉及的性能瓶颈。我们表明，传统的负面确认（NACK）的方法要求广播发件人对丢失的数据包进行重传，降级流吞吐量。为了减轻这个问题，我们提出了一种基于新的远程内存访问（RMA）的方案，以提供MPI级的高性能可靠性支持。在所提出的方案中，接收器本身（与发件人相反）通过RMA操作检索丢失的数据包。此外，我们提供分析模型以说明所提出的基于RMA的方案的存储器要求。我们的实验结果表明，与现有解决方案相比，该方案几乎没有开销。在注入故障的微基准（以模拟不可靠的网络环境）中，与现有的基于NACK的方案相比，所提出的方案显着降低延迟减少。此外，通过合成流基准，我们的设计还显示出高达88个GPU-致密CRAY CS-Storm系统的广播速率高达56％的广播速率，最高可达88个NVIDIA K80 GPU卡。

著录项

来源
《International Conference for High Performance Computing, Networking, Storage and Analysis》|2016年|1 v.|共10页
会议地点
作者
C.-H. Chu; K. Hamidouche; H. Subramoni; A. Venkatesh; B. Elton; D. K. Panda;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Hardware; Receivers; Graphics processing units; Scalability; Libraries; Reliability engineering;

机译：硬件;接收器;图形处理单元;可扩展性;图书馆;可靠性工程;

相似文献

外文文献
中文文献
专利

1. Benefits of Adding Hardware Support for Broadcast and Reduce Operations in MPSoC Applications [J] . YUANXI PENG, MANUEL SALDANA, CHRISTOPHER A. MADILL, ACM transactions on reconfigurable technology and systems . 2014,第3期

机译：增加对广播的硬件支持并减少MPSoC应用程序中的操作的好处
2. Exploring Efficient Hardware Support for Applications with Irregular Memory Patterns on Multinode Manycore Architectures [J] . Marco Ceriani, Simone Secchi, Oreste Villa, IEEE Transactions on Parallel and Distributed Systems . 2017,第6期

机译：探索对多节点Manycore架构上具有不规则内存模式的应用程序的有效硬件支持
3. Efficient communication support in predictable heterogeneous MPSoC designs for streaming applications [J] . Yifan He, Dongrui She, Sander Stuijk, Journal of systems architecture . 2013,第10aPta1期

机译：针对流应用的可预测的异构MPSoC设计中的有效通信支持
4. Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications [C] . C.-H. Chu, K. Hamidouche, H. Subramoni, 2016 First Workshop on Optimization of Communication in HPC . 2016

机译：在启用GPU的流应用程序中对基于硬件多播的广播的高效可靠性支持
5. Efficient broadcast of XML streams in a push based environment. [D] . Chaluvadi, Vamsi Krishna. 2003

机译：在基于推送的环境中有效广播XML流。
6. The Stream Algorithm: Computationally Efficient Ridge-Regression via Bayesian Model Averaging and Applications to Pharmacogenomic Prediction of Cancer Cell Line Sensitivity [O] . Elias Chaibub Neto, In Sock Jang, Stephen H. Friend, -1

机译：流算法：通过贝叶斯模型平均计算有效的岭回归及其在癌细胞系敏感性药物基因组学预测中的应用
7. On Supporting Power-Efficient Streaming Applications in Wireless Environments [O] . Hao Zhu, Guohong Cao 2005

机译：在无线环境中支持高能效流媒体应用

Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications

摘要

著录项

相似文献

相关主题

期刊订阅