Improving training time of deep neural networkwith asynchronous averaged stochastic gradient descent

机译：异步平均随机梯度下降法缩短了深度神经网络的训练时间

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep neural network acoustic models have shown large improvement in performance over Gaussian mixture models (GMMs) in recent studies. Typically, stochastic gradient descent (SGD) is the most popular method for training deep neural networks. However, training DNN with minibatch based SGD is very slow. Because it requires frequent serial training and scanning the whole training set many passes before reaching the asymptotic region, making it difficult to scale to large dataset. Commonly, we can reduce training time from two aspects, reducing the epochs of training and exploring the distributed training algorithm. There are some distributed training algorithms, such as LBFGS, Hessian-free optimization and asynchronous SGD, have proven significantly reducing the training time. In order to further reduce the training time, we attempted to explore training algorithm with fast convergence and combined it with distributed training algorithm. Averaged stochastic gradient descent (ASGD) is proved simple and effective for one pass on-line learning. This paper investigates the asynchronous ASGD algorithm for deep neural network training. We tested asynchronous ASGD on the Mandarin Chinese recorded speech recognition task using deep neural networks. Experimental results show that the performance of one pass asynchronous ASGD is very close to that of multiple passes asynchronous SGD. Meanwhile, we can reduce the training time by a factor of 6.3.

机译：在最近的研究中，深层神经网络声学模型已显示出比高斯混合模型（GMM）更好的性能。通常，随机梯度下降（SGD）是训练深度神经网络的最流行方法。但是，使用基于minibatch的SGD训练DNN的速度非常慢。因为它需要频繁的串行训练并在到达渐近区域之前扫描整个训练集许多遍，所以很难扩展到大型数据集。通常，我们可以从两个方面减少训练时间，减少训练的时间和探索分布式训练算法。已经证明了一些分布式训练算法，例如LBFGS，无Hessian优化和异步SGD，可大大减少训练时间。为了进一步减少训练时间，我们尝试探索具有快速收敛性的训练算法，并将其与分布式训练算法相结合。事实证明，平均随机梯度下降（ASGD）对于一次在线学习是简单有效的。本文研究了用于深度神经网络训练的异步ASGD算法。我们使用深度神经网络在普通话录制的语音识别任务上测试了异步ASGD。实验结果表明，单遍异步ASGD的性能非常接近多遍异步SGD的性能。同时，我们可以将培训时间减少6.3倍。

著录项

来源
《International Symposium on Chinese Spoken Language Processing》|2014年|446-449|共4页
会议地点
作者
You Zhao; Xu Bo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Acoustics; Neural networks; Optimization; Speech; Speech recognition; Stochastic processes; Training; Asynchronous averaged SGD; deep neural network; one pass learning; speech recognition;

机译：声学;神经网络;优化;语音;语音识别;随机过程;训练;异步平均SGD;深度神经网络;一次通过学习;语音识别;

相似文献

外文文献
中文文献
专利

1. Non-convergence of stochastic gradient descent in the training of deep neural networks [J] . Cheridito Patrick, Jentzen Arnulf, Rossmannek Florian Journal of complexity . 2021,第Juna期

机译：深神经网络训练中随机梯度下降的非融合
2. Accelerating deep neural network training with inconsistent stochastic gradient descent [J] . Wang Linnan, Yang Yi, Min Renqiang, Neural Networks: The Official Journal of the International Neural Network Society . 2017,第期

机译：加速深度神经网络训练，随机梯度下降不一致
3. Stochastic Gradient Descent–Whale Optimization Algorithm-Based Deep Convolutional Neural Network To Crowd Emotion Understanding [J] . Avinash Ratre The Computer journal . 2020,第2CD期

机译：基于随机梯度下降-鲸鱼优化算法的深度卷积神经网络对人群情感理解
4. Improving training time of deep neural networkwith asynchronous averaged stochastic gradient descent [C] . You Zhao, Xu Bo International Symposium on Chinese Spoken Language Processing . 2014

机译：改善深神经网络的培训时间，With异步平均随机梯度下降
5. Averaging Projected Stochastic Gradient Descent for large scale least square problem. [D] . Mu, Yang. 2012

机译：对大型最小二乘问题平均投影随机梯度下降。
6. Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks [O] . Shrihari Vasudevan 2020

机译：基于互动信息的学习速率衰减用于深神经网络的随机梯度血统训练
7. GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training [O] . Paine, Thomas, Jin, Hailin, Yang, Jianchao, 2013

机译：GpU异步随机梯度下降加速神经网络训练

Improving training time of deep neural networkwith asynchronous averaged stochastic gradient descent

摘要

著录项

相似文献

相关主题

期刊订阅