首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD

Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD




Distributed stochastic gradient descent (SGD) is essential for scaling the machine learning algorithms to a large number of computing nodes. However, the infrastructures variability such as high communication delay or random node slowdown greatly impedes the performance of distributed SGD algorithm, especially in a wireless system or sensor networks. In this paper, we propose an algorithmic approach named Overlap Local-SGD (and its momentum variant) to overlap communication and computation so as to speedup the distributed training procedure. The approach can help to mitigate the straggler effects as well. We achieve this by adding an anchor model on each node. After multiple local updates, locally trained models will be pulled back towards the synchronized anchor model rather than communicating with others. Experimental results of training a deep neural network on CIFAR-10 dataset demonstrate the effectiveness of Overlap Local-SGD. We also provide a convergence guarantee for the proposed algorithm under non-convex objective functions.A full version of this paper with additional examples and proofs is accessible at: http://andrew.cmu.edu/user/gaurij/overlap_local_SGD.pdf.
机译:分布式随机梯度下降(SGD)对于将机器学习算法扩展到大量计算节点至关重要。但是,基础设施的可变性(例如高通信延迟或随机节点速度变慢)极大地阻碍了分布式SGD算法的性能,尤其是在无线系统或传感器网络中。在本文中,我们提出了一种名为Overlap Local-SGD(及其动量变体)的算法方法来重叠通信和计算,以加快分布式训练过程。该方法也可以帮助减轻散乱效应。我们通过在每个节点上添加锚模型来实现。在进行多个本地更新之后,本地训练的模型将被拉回到同步的锚模型,而不是与其他模型进行通信。在CIFAR-10数据集上训练深度神经网络的实验结果证明了Overlap Local-SGD的有效性。我们还为非凸目标函数下的拟议算法提供了收敛保证。可在以下位置访问本文的完整版本,以及其他示例和证明:http://andrew.cmu.edu/user/gaurij/overlap_local_SGD.pdf。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号