首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning
【24h】

MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning

机译:MG-WFBP:明智地合并梯度以获得分布式深度学习中的高效沟通

获取原文
获取原文并翻译 | 示例

摘要

Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks (DNNs) on computer clusters. With the increase of computational power, network communications generally limit the system scalability. Wait-free backpropagation (WFBP) is a popular solution to overlap communications with computations during the training process. In this article, we observe that many DNNs have a large number of layers with only a small amount of data to be communicated at each layer in distributed training, which could make WFBP inefficient. Based on the fact that merging some short communication tasks into a single one can reduce the overall communication time, we formulate an optimization problem to minimize the training time in pipelining communications and computations. We derive an optimal solution that can be solved efficiently without affecting the training performance. We then apply the solution to propose a distributed training algorithm named merged-gradient WFBP (MG-WFBP) and implement it in two platforms Caffe and PyTorch. Extensive experiments in three GPU clusters are conducted to verify the effectiveness of MG-WFBP. We further exploit trace-based simulations of 4 to 2048 GPUs to explore the potential scaling efficiency of MG-WFBP. Experimental results show that MG-WFBP achieves much better scaling performance than existing methods.
机译:分布式同步随机梯度下降已广泛用于培训计算机集群上的深神经网络(DNN)。随着计算能力的增加,网络通信通常限制系统可扩展性。等待BackPropagation(WFBP)是一种流行的解决方案,可以在培训过程中与计算重叠通信。在本文中,我们观察到,许多DNN具有大量的层,只有少量数据才能在分布式训练中的每层传送,这可以使WFBP效率低下。基于将一些短期通信任务合并到单个单一的事实中,可以减少整体通信时间,我们制定了优化问题,以最大限度地减少流水线通信和计算中的训练时间。我们得出了一个最佳解决方案,可以有效地解决而不会影响培训性能。然后,我们将解决方案提出一个名为Merged-梯度WFBP(MG-WFBP)的分布式训练算法,并在两个平台Caffe和Pytorch中实现它。进行三种GPU集群的广泛实验,以验证MG-WFBP的有效性。我们进一步利用基于轨迹的4至2048GPU模拟,探讨MG-WFBP的潜在缩放效率。实验结果表明,MG-WFBP比现有方法实现了更好的缩放性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号