MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning

Shi Shaohuai; Chu Xiaowen; Li Bo

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning

【24h】

MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning

机译：MG-WFBP：明智地合并梯度以获得分布式深度学习中的高效沟通

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks (DNNs) on computer clusters. With the increase of computational power, network communications generally limit the system scalability. Wait-free backpropagation (WFBP) is a popular solution to overlap communications with computations during the training process. In this article, we observe that many DNNs have a large number of layers with only a small amount of data to be communicated at each layer in distributed training, which could make WFBP inefficient. Based on the fact that merging some short communication tasks into a single one can reduce the overall communication time, we formulate an optimization problem to minimize the training time in pipelining communications and computations. We derive an optimal solution that can be solved efficiently without affecting the training performance. We then apply the solution to propose a distributed training algorithm named merged-gradient WFBP (MG-WFBP) and implement it in two platforms Caffe and PyTorch. Extensive experiments in three GPU clusters are conducted to verify the effectiveness of MG-WFBP. We further exploit trace-based simulations of 4 to 2048 GPUs to explore the potential scaling efficiency of MG-WFBP. Experimental results show that MG-WFBP achieves much better scaling performance than existing methods.

机译：分布式同步随机梯度下降已广泛用于培训计算机集群上的深神经网络（DNN）。随着计算能力的增加，网络通信通常限制系统可扩展性。等待BackPropagation（WFBP）是一种流行的解决方案，可以在培训过程中与计算重叠通信。在本文中，我们观察到，许多DNN具有大量的层，只有少量数据才能在分布式训练中的每层传送，这可以使WFBP效率低下。基于将一些短期通信任务合并到单个单一的事实中，可以减少整体通信时间，我们制定了优化问题，以最大限度地减少流水线通信和计算中的训练时间。我们得出了一个最佳解决方案，可以有效地解决而不会影响培训性能。然后，我们将解决方案提出一个名为Merged-梯度WFBP（MG-WFBP）的分布式训练算法，并在两个平台Caffe和Pytorch中实现它。进行三种GPU集群的广泛实验，以验证MG-WFBP的有效性。我们进一步利用基于轨迹的4至2048GPU模拟，探讨MG-WFBP的潜在缩放效率。实验结果表明，MG-WFBP比现有方法实现了更好的缩放性能。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2021年第8期|1903-1917|共15页
作者
Shi Shaohuai; Chu Xiaowen; Li Bo;
展开▼
作者单位

Hong Kong Univ Sci & Technol Dept Comp Sci & Engn Kowloon Hong Kong Peoples R China;

Hong Kong Baptist Univ Dept Comp Sci Kowloon Hong Kong Peoples R China;

Hong Kong Univ Sci & Technol Dept Comp Sci & Engn Kowloon Hong Kong Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Training; Backpropagation; Hardware; Graphics processing units; Tensors; Neural networks; Data models; Deep learning; GPU; distributed stochastic gradient descent; gradient communication; merged-gradient;

机译：培训;BackPropagation;硬件;图形处理单元;张量;神经网络;数据模型;深入学习;GPU;分布式随机梯度下降;梯度通信;合并梯度;

相似文献

外文文献
中文文献
专利

1. LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning [J] . Zhang Jingjing, Simeone Osvaldo Neural Networks and Learning Systems, IEEE Transactions on . 2021,第3期

机译：LAGC：懒散的梯度编码，用于锻炼和通信有效的分布式学习
2. Approximate to Be Great: Communication Efficient and Privacy-Preserving Large-Scale Distributed Deep Learning in Internet of Things [J] . Du Wei, Li Ang, Zhou Pan, Internet of Things Journal, IEEE . 2020,第12期

机译：近似是大：沟通高效和隐私保留的大规模分布式深度学习
3. Distributed Energy-Efficient Multi-UAV Navigation for Long-Term Communication Coverage by Deep Reinforcement Learning [J] . Nature reviews Drug discovery . 2020,第6期

机译：深度加固学习的长期通信覆盖的分布式节能多UV导航
4. Communication-Efficient Distributed Deep Learning with Merged Gradient Sparsification on GPUs [C] . Shaohuai Shi, Qiang Wang, Xiaowen Chu, IEEE Conference on Computer Communications . 2020

机译：GPU上具有合并梯度稀疏性的高效通信高效分布式深度学习
5. A Data-Parallel Approach for Efficient Resource Utilization in Distributed Serverless Deep Learning [D] . Assogba, Kevin Tunder Elom. 2020

机译：分布式无服务深度学习中有效资源利用的数据并行方法
6. Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm [O] . Rui Duan, Mary Regina Boland, Zixuan Liu, 2020

机译：从多个地点的电子健康记录中学习：一种通信高效和隐私保留分布式算法
7. MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning [O] . Shaohuai Shi, Xiaowen Chu, Bo Li 2021

机译：MG-WFBP：明智地合并梯度以获得分布式深度学习中的高效沟通

MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅