首页> 外文会议>International Symposium on Parallel and Distributed Computing >RDMA Managed Buffers: A Case for Accelerating Communication Bound Processes via Fine-Grained Events for Zero-Copy Message Passing
【24h】

RDMA Managed Buffers: A Case for Accelerating Communication Bound Processes via Fine-Grained Events for Zero-Copy Message Passing

机译:RDMA管理的缓冲区:通过零粒度消息传递的细粒度事件加速通信绑定进程的案例

获取原文

摘要

To take full advantage of modern high performance architectures, many large-scale data-driven applications require loosely-coupled, fine-grained asynchronous communication. Accordingly, efficient lightweight middleware based on state-of-the-art networking technology such as RDMA is becoming a necessity. The performance critical task of handling RDMA synchronization, for existing message passing runtimes are mostly coarse granular in nature and thus may be associated with various hidden costs. While low-level RDMA libraries expose fine-grained RDMA communication that can be efficiently controlled, the critical tasks of RDMA buffer management, synchronization and flow control are left to the userspace applications requiring tedious programming effort that may lead to sub-optimal performance. In this paper we present a user-space RDMA transport layer that allows RDMA-enabled memory to be managed internally while still exposing zero-copy completion event-based RDMA transfers for message passing. The integration of an RDMA transport layer enables the opportunity for parallel applications to utilize RDMA-managed buffers for accelerating communication while co-existing with high-level MPI, GASNet or similar middleware. We show a performance speedup of up to 8X in latency/bandwidth benchmarks and 5%-90% improvement in response time or messaging rate in three reference applications with regard to their MPI implementations.
机译:为了充分利用现代高性能体系结构,许多大型数据驱动的应用程序需要松耦合的细粒度异步通信。因此,基于最新网络技术(例如RDMA)的高效轻量级中间件已成为必需。对于现有的消息传递运行时,处理RDMA同步的性能至关重要的任务本质上大多是粗粒度的,因此可能与各种隐藏成本相关联。虽然低级RDMA库公开了可以有效控制的细粒度RDMA通信,但RDMA缓冲区管理,同步和流控制的关键任务留给了用户空间应用程序,这些程序需要繁琐的编程工作,而这可能会导致性能欠佳。在本文中,我们介绍了一个用户空间RDMA传输层,该层允许在内部管理启用RDMA的内存,同时仍为消息传递提供基于零复制完成事件的RDMA传输。 RDMA传输层的集成使并行应用程序有机会利用RDMA管理的缓冲区来加速通信,同时与高级MPI,GASNet或类似的中间件共存。我们在三个参考应用程序的MPI实现方面显示,延迟/带宽基准测试的性能提高了8倍,响应时间或消息传递率提高了5%-90%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号