Deep learning with Elastic Averaging SGD

机译：使用Elastic Averaging SGD进行深度学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study the problem of stochastic optimization for deep learning in the parallel computing environment under communication constraints. A new algorithm is proposed in this setting where the communication and coordination of work among concurrent processes (local workers), is based on an elastic force which links the parameters they compute with a center variable stored by the parameter server (master). The algorithm enables the local workers to perform more exploration, i.e. the algorithm allows the local variables to fluctuate further from the center variable by reducing the amount of communication between local workers and the master. We empirically demonstrate that in the deep learning setting, due to the existence of many local optima, allowing more exploration can lead to the improved performance. We propose synchronous and asynchronous variants of the new algorithm. We provide the stability analysis of the asynchronous variant in the round-robin scheme and compare it with the more common parallelized method ADMM. We show that the stability of EASGD is guaranteed when a simple stability condition is satisfied, which is not the case for ADMM. We additionally propose the momentum-based version of our algorithm that can be applied in both synchronous and asynchronous settings. Asynchronous variant of the algorithm is applied to train convolutional neural networks for image classification on the CIFAR and ImageNet datasets. Experiments demonstrate that the new algorithm accelerates the training of deep architectures compared to DOWNPOUR and other common baseline approaches and furthermore is very communication efficient.

机译：我们研究了通信约束下并行计算环境中深度学习的随机优化问题。在这种情况下，提出了一种新算法，其中并发进程（本地工人）之间的通信和协调是基于弹力的，该弹力将他们计算的参数与参数服务器（主服务器）存储的中心变量联系起来。该算法使本地工人能够执行更多探索，即，该算法通过减少本地工人与主机之间的通信量，允许本地变量与中心变量进一步波动。我们凭经验证明，在深度学习环境中，由于存在许多局部最优，允许进行更多探索可以提高性能。我们提出了新算法的同步和异步变体。我们提供了轮询方案中异步变量的稳定性分析，并将其与更常见的并行化方法ADMM进行了比较。我们证明，当满足简单的稳定性条件时，可以保证EASGD的稳定性，而对于ADMM则不是这样。我们另外提出了基于动量的算法版本，可同时用于同步和异步设置。该算法的异步变体被应用于训练卷积神经网络，以对CIFAR和ImageNet数据集进行图像分类。实验表明，与DOWNPOUR和其他常见的基线方法相比，该新算法可加快对深层体系结构的训练，并且通信效率很高。

著录项

来源
《》|2015年|685-693|共9页
会议地点
作者
Sixin Zhang; Anna Choromanska; Yann LeCun;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. The Speech Generating Device (SGD) Mentoring Program: Supporting the Development of People Learning to Use an SGD [J] . BallinL., BalandinS., StancliffeR.J. Journal of developmental and physical disabilities . 2013,第4期

机译：语音生成设备（SGD）指导计划：支持学习使用SGD的人们的发展
2. Monitoring combustion instabilities of stratified swirl flames by feature extractions of time-averaged flame images using deep learning method [J] . Zhou Yuchen, Zhang Chi, Han Xiao, Aerospace science and technology . 2021,第Feba期

机译：利用深度学习方法监测分层旋流火焰的燃烧不稳定性
3. Sample average approximation of CVaR-based hedging problem with a deep-learning solution [J] . Peng Cheng, Li Shuang, Zhao Yanlong, The North American journal of economics and finance . 2021,第Apra期

机译：深度学习解决方案的基于CVAR的对冲问题的示例平均近似
4. Deep learning with Elastic Averaging SGD [C] . Sixin Zhang, Anna Choromanska, Yann LeCun Annual conference on Neural Information Processing Systems . 2015

机译：利用弹性平均SGD学习
5. DXA Image Based Deep Learning of Elastic Modulus of Human Trabcular Bone in Proximal Femur [D] . Olufemi, Mustapha Mohammed Habeeb. 2020

机译：基于DXA图像的近端股骨骨髓骨弹性模量的深度学习
6. Performance for rotor system of hybrid electromagnetic bearing and elastic foil gas bearing with dynamic characteristics analysis under deep learning [O] . Xiangxi Du, Yanhua Sun 2021

机译：混合电磁轴承转子系统的性能和深度学习动态特性分析的弹性箔气体轴承
7. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning [O] . Hao Yu, Sen Yang, Shenghuo Zhu 2019

机译：并行重启SGD，具有更快的融合和更少的通信：揭示为什么模型平均为深度学习工作

Deep learning with Elastic Averaging SGD

摘要

著录项

相似文献

相关主题

期刊订阅