Improving the Performance of Distributed MXNet with RDMA

Li Mingfan; Wen Ke; Lin Han; Jin Xu; Wu Zheng; An Hong; Chi Mengxian

首页> 外文期刊>International journal of parallel programming >Improving the Performance of Distributed MXNet with RDMA

【24h】

Improving the Performance of Distributed MXNet with RDMA

机译：使用RDMA提高分布式MXNet的性能

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

As one of the most influential deep learning frameworks, MXNet has achieved excellent performance and many breakthroughs in academic and industrial fields for various machine learning situations. The initial implementation of MXNet uses proxy-socket interface, which delivers suboptimal performance in distributed environment. In a massive parallel training task, parameters are updated frequently during each training loop, in which case network performance becomes the main factor of overall performance. Over the past decade, high performance interconnects have employed remote direct memory access (RDMA) technology to provide excellent performance for numerous scientific domains. In this paper, we describe an efficient design that extends the open-source MXNet to make it RDMA capable via RDMA-based parameter server interfaces. With modest optimizations towards memory usage and transmission overhead, RDMA-based MXNet achieves great performance improvement over the original software. Our experiments reveal that, for the communication subsystem of MXNet, the new design achieves 16x speedup (up to 21x at peak) over 1 Gigabit Ethernet (1GigE). For the two training cases on MXNet, the optimized implementation gains 5x and 9x speedup, respectively. Compared to experiments on the IP-over-InfiniBand (IPoIB) protocol, it achieves nearly 30% performance improvement, as well as better scalability and alleviation of bottlenecks.

机译：作为最具影响力的深度学习框架之一，MXNet在各种机器学习情况下在学术和工业领域均取得了出色的性能和许多突破。 MXNet的初始实现使用代理套接字接口，该接口在分布式环境中提供了次优性能。在大型并行训练任务中，在每个训练循环中都会频繁更新参数，在这种情况下，网络性能成为整体性能的主要因素。在过去的十年中，高性能互连已采用远程直接内存访问（RDMA）技术为众多科学领域提供出色的性能。在本文中，我们描述了一种有效的设计，该设计通过基于RDMA的参数服务器接口扩展了开源MXNet，使其具有RDMA功能。通过对内存使用和传输开销进行适度的优化，基于RDMA的MXNet与原始软件相比，可显着提高性能。我们的实验表明，对于MXNet的通信子系统，新设计在1 Gb以太网（1GigE）上实现了16倍的加速（峰值时高达21倍）。对于MXNet上的两个训练案例，优化的实现分别使速度提高了5倍和9倍。与基于IP上InfiniBand（IPoIB）协议的实验相比，它可将性能提高近30％，并具有更好的可伸缩性和缓解瓶颈的能力。

著录项

来源
《International journal of parallel programming》 |2019年第3期|467-480|共14页
作者
Li Mingfan; Wen Ke; Lin Han; Jin Xu; Wu Zheng; An Hong; Chi Mengxian;
展开▼
作者单位

Univ Sci & Technol China Hefei 230026 Anhui Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Distributed MXNet; Parameter server; RDMA; InfiniBand; Network optimization;

机译：分布式MXNet;参数服务器;RDMA;InfiniBand;网络优化;

相似文献

外文文献
中文文献
专利

1. Improving the Performance of Distributed TensorFlow with RDMA [J] . Chengfan Jia, Junnan Liu, Xu Jin, International journal of parallel programming . 2018,第4期

机译：使用RDMA改善分布式TensorFlow的性能
2. Protocol Customization for Improving MPI Performance on RDMA-Enabled Clusters [J] . Zheng Gu, Matthew Small, Xin Yuan, International journal of parallel programming . 2013,第5期

机译：协议自定义以提高启用RDMA的群集上的MPI性能
3. Scaling out NUMA-Aware Applications with RDMA-Based Distributed Shared Memory [J] . Yang Hong, Yang Zheng, Fan Yang, 计算机科学技术学报（英文版） . 2019,第001期

机译：使用基于RDMA的分布式共享内存扩展NUMA感知的应用程序
4. RM-KVStore: New MXNet KVStore to Accelerate Transfer Performancewith RDMA [C] . Baocai Lv, Bing Liu, Fang Liu, IEEE Symposium on Computers and Communications . 2018

机译：RM-KVStore：新的MXNet KVStore通过RDMA加快传输性能
5. Effects of Processor-Native Memory Transactions in Optimizing RDMA Transfers in Distributed Shared Memory Systems [D] . Paraskevas, Kyriakos. 2021

机译：处理器 - 本机内存事务在分布式共享内存系统中优化RDMA传输中的影响
6. Improved Quantum Artificial Fish Algorithm Application to Distributed Network Considering Distributed Generation [O] . Tingsong Du, Yang Hu, Xianting Ke 2015

机译：考虑分布式发电的改进型量子人工鱼算法在分布式网络中的应用
7. A Distributed Framework for Low-Latency OpenVX over the RDMA NoC of a Clustered Manycore [O] . Julien Hascoe, Benoet Dupont de Dinechin, Karol Desnos, 2018

机译：在群集多芯的RDMA NOC上的低延迟OpenVX分布式框架

Improving the Performance of Distributed MXNet with RDMA

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅