首页> 外文会议> >Deep learning with Elastic Averaging SGD
【24h】

Deep learning with Elastic Averaging SGD

机译:使用Elastic Averaging SGD进行深度学习

获取原文

摘要

We study the problem of stochastic optimization for deep learning in the parallel computing environment under communication constraints. A new algorithm is proposed in this setting where the communication and coordination of work among concurrent processes (local workers), is based on an elastic force which links the parameters they compute with a center variable stored by the parameter server (master). The algorithm enables the local workers to perform more exploration, i.e. the algorithm allows the local variables to fluctuate further from the center variable by reducing the amount of communication between local workers and the master. We empirically demonstrate that in the deep learning setting, due to the existence of many local optima, allowing more exploration can lead to the improved performance. We propose synchronous and asynchronous variants of the new algorithm. We provide the stability analysis of the asynchronous variant in the round-robin scheme and compare it with the more common parallelized method ADMM. We show that the stability of EASGD is guaranteed when a simple stability condition is satisfied, which is not the case for ADMM. We additionally propose the momentum-based version of our algorithm that can be applied in both synchronous and asynchronous settings. Asynchronous variant of the algorithm is applied to train convolutional neural networks for image classification on the CIFAR and ImageNet datasets. Experiments demonstrate that the new algorithm accelerates the training of deep architectures compared to DOWNPOUR and other common baseline approaches and furthermore is very communication efficient.
机译:我们研究了通信约束下并行计算环境中深度学习的随机优化问题。在这种情况下,提出了一种新算法,其中并发进程(本地工人)之间的通信和协调是基于弹力的,该弹力将他们计算的参数与参数服务器(主服务器)存储的中心变量联系起来。该算法使本地工人能够执行更多探索,即,该算法通过减少本地工人与主机之间的通信量,允许本地变量与中心变量进一步波动。我们凭经验证明,在深度学习环境中,由于存在许多局部最优,允许进行更多探索可以提高性能。我们提出了新算法的同步和异步变体。我们提供了轮询方案中异步变量的稳定性分析,并将其与更常见的并行化方法ADMM进行了比较。我们证明,当满足简单的稳定性条件时,可以保证EASGD的稳定性,而对于ADMM则不是这样。我们另外提出了基于动量的算法版本,可同时用于同步和异步设置。该算法的异步变体被应用于训练卷积神经网络,以对CIFAR和ImageNet数据集进行图像分类。实验表明,与DOWNPOUR和其他常见的基线方法相比,该新算法可加快对深层体系结构的训练,并且通信效率很高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号