首页> 外文会议>International Conference for High Performance Computing, Networking, Storage and Analysis >Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications
【24h】

Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications

机译:高效的可靠性支持在支持GPU的流式流应用中的基于硬件组播的广播

获取原文

摘要

Streaming applications, which are data-intensive, have been extensively run on High-Performance Computing (HPC) systems to seek the higher performance and scalability. These applications typically utilize broadcast operations to disseminate in real-time data from a single source to multiple workers, each being a multi-GPU based computing site. State-of-the-art broadcast operations take advantage of InfiniBand (IB) hardware multicast (MCAST) and NVIDIA GPUDirect features to boost inter-node communications performance and scalability. The IB MCAST feature works only with the IB Unreliable Datagram (UD) mechanism and consequently provides unreliable communication for applications. Higher-level libraries and/or runtime environments must handle and provide reliability explicitly. However, handling reliability at that level can be a performance bottleneck for streaming applications. In this paper, we analyze the specific requirements of streaming applications and the performance bottlenecks involved in handling reliability. We show that the traditional Negative-Acknowledgement (NACK) based approach requires the broadcast sender to perform retransmissions for lost packets, degrading streaming throughput. To alleviate this issue, we propose a novel Remote Memory Access (RMA) based scheme to provide high-performance reliability support at the MPI-level. In the proposed scheme, the receivers themselves (as opposed to the sender) retrieve lost packets through RMA operations. Furthermore, we provide an analytical model to illustrate the memory requirements of the proposed RMA-based scheme. Our experimental results show that the proposed scheme introduces nearly no overhead compared to the existing solutions. In a micro-benchmark with injected failures (to simulate unreliable network environments), the proposed scheme shows up to 45% reduction in latency compared to the existing NACK-based scheme. Moreover, with a synthetic streaming benchmark, our design also shows up to a 56% higher broadcast rate compared to the traditional NACK-based scheme on a GPU-dense Cray CS-Storm system with up to 88 NVIDIA K80 GPU cards.
机译:具有数据密集型的流应用已经广泛运行高性能计算(HPC)系统,以寻求更高的性能和可扩展性。这些应用程序通常利用广播操作从单个源传播到多个工人的实时数据,每个基于多GPU的计算站点。国家的最先进的广播业务利用的InfiniBand(IB)硬件组播(MCAST)和NVIDIA GPUDirect特性来提高节点间的通信性能和可扩展性。 IB Mcast功能仅适用于IB不可靠的数据报(UD)机制,从而为应用程序提供不可靠的通信。更高级别的库和/或运行时环境必须明确地处理并提供可靠性。但是,处理该级别的可靠性可以是流媒体应用程序的性能瓶颈。在本文中,我们分析了流媒体应用的具体要求和处理可靠性所涉及的性能瓶颈。我们表明,传统的负面确认(NACK)的方法要求广播发件人对丢失的数据包进行重传,降级流吞吐量。为了减轻这个问题,我们提出了一种基于新的远程内存访问(RMA)的方案,以提供MPI级的高性能可靠性支持。在所提出的方案中,接收器本身(与发件人相反)通过RMA操作检索丢失的数据包。此外,我们提供分析模型以说明所提出的基于RMA的方案的存储器要求。我们的实验结果表明,与现有解决方案相比,该方案几乎没有开销。在注入故障的微基准(以模拟不可靠的网络环境)中,与现有的基于NACK的方案相比,所提出的方案显着降低延迟减少。此外,通过合成流基准,我们的设计还显示出高达88个GPU-致密CRAY CS-Storm系统的广播速率高达56%的广播速率,最高可达88个NVIDIA K80 GPU卡。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号