首页> 外文期刊>International journal of parallel programming >Improving the Performance of Distributed MXNet with RDMA
【24h】

Improving the Performance of Distributed MXNet with RDMA

机译:使用RDMA提高分布式MXNet的性能

获取原文
获取原文并翻译 | 示例

摘要

As one of the most influential deep learning frameworks, MXNet has achieved excellent performance and many breakthroughs in academic and industrial fields for various machine learning situations. The initial implementation of MXNet uses proxy-socket interface, which delivers suboptimal performance in distributed environment. In a massive parallel training task, parameters are updated frequently during each training loop, in which case network performance becomes the main factor of overall performance. Over the past decade, high performance interconnects have employed remote direct memory access (RDMA) technology to provide excellent performance for numerous scientific domains. In this paper, we describe an efficient design that extends the open-source MXNet to make it RDMA capable via RDMA-based parameter server interfaces. With modest optimizations towards memory usage and transmission overhead, RDMA-based MXNet achieves great performance improvement over the original software. Our experiments reveal that, for the communication subsystem of MXNet, the new design achieves 16x speedup (up to 21x at peak) over 1 Gigabit Ethernet (1GigE). For the two training cases on MXNet, the optimized implementation gains 5x and 9x speedup, respectively. Compared to experiments on the IP-over-InfiniBand (IPoIB) protocol, it achieves nearly 30% performance improvement, as well as better scalability and alleviation of bottlenecks.
机译:作为最具影响力的深度学习框架之一,MXNet在各种机器学习情况下在学术和工业领域均取得了出色的性能和许多突破。 MXNet的初始实现使用代理套接字接口,该接口在分布式环境中提供了次优性能。在大型并行训练任务中,在每个训练循环中都会频繁更新参数,在这种情况下,网络性能成为整体性能的主要因素。在过去的十年中,高性能互连已采用远程直接内存访问(RDMA)技术为众多科学领域提供出色的性能。在本文中,我们描述了一种有效的设计,该设计通过基于RDMA的参数服务器接口扩展了开源MXNet,使其具有RDMA功能。通过对内存使用和传输开销进行适度的优化,基于RDMA的MXNet与原始软件相比,可显着提高性能。我们的实验表明,对于MXNet的通信子系统,新设计在1 Gb以太网(1GigE)上实现了16倍的加速(峰值时高达21倍)。对于MXNet上的两个训练案例,优化的实现分别使速度提高了5倍和9倍。与基于IP上InfiniBand(IPoIB)协议的实验相比,它可将性能提高近30%,并具有更好的可伸缩性和缓解瓶颈的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号