Improving training time of deep neural networkwith asynchronous averaged stochastic gradient descent

机译：改善深神经网络的培训时间，With异步平均随机梯度下降

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep neural network acoustic models have shown large improvement in performance over Gaussian mixture models (GMMs) in recent studies. Typically, stochastic gradient descent (SGD) is the most popular method for training deep neural networks. However, training DNN with minibatch based SGD is very slow. Because it requires frequent serial training and scanning the whole training set many passes before reaching the asymptotic region, making it difficult to scale to large dataset. Commonly, we can reduce training time from two aspects, reducing the epochs of training and exploring the distributed training algorithm. There are some distributed training algorithms, such as LBFGS, Hessian-free optimization and asynchronous SGD, have proven significantly reducing the training time. In order to further reduce the training time, we attempted to explore training algorithm with fast convergence and combined it with distributed training algorithm. Averaged stochastic gradient descent (ASGD) is proved simple and effective for one pass on-line learning. This paper investigates the asynchronous ASGD algorithm for deep neural network training. We tested asynchronous ASGD on the Mandarin Chinese recorded speech recognition task using deep neural networks. Experimental results show that the performance of one pass asynchronous ASGD is very close to that of multiple passes asynchronous SGD. Meanwhile, we can reduce the training time by a factor of 6.3.

机译：深度神经网络声学模型表现出在最近的研究中对高斯混合模型（GMMS）的性能进行了大的改进。通常，随机梯度下降（SGD）是训练深神经网络的最常用的方法。但是，培训具有基于小靶的SGD的DNN非常慢。因为它需要频繁的串行训练和扫描整个训练，在到达渐近区域之前，许多通过许多通过，使得难以扩展到大型数据集。通常，我们可以减少两个方面的培训时间，减少培训时期的时期和探索分布式训练算法。有一些分布式训练算法，如LBFG，无Hessian--Use Optimization和异步SGD，已证明显着降低了培训时间。为了进一步减少培训时间，我们试图利用分布式训练算法探索快速收敛的培训算法，并将其与分布式训练算法组合。对一个传递在线学习证明了平均随机梯度下降（ASGD）简单有效。本文研究了深度神经网络训练的异步ASGD算法。我们使用深神经网络在普通话中录制的语音识别任务上测试了异步ASGD。实验结果表明，一种传递异步ASGD的性能非常接近多次通过异步SGD的近似。同时，我们可以将培训时间减少为6.3倍。

著录项

来源
《International Symposium on Chinese Spoken Language Processing》|2014年||共4页
会议地点
作者
You Zhao; Xu Bo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词
Acoustics; Neural networks; Optimization; Speech; Speech recognition; Stochastic processes; Training; Asynchronous averaged SGD; deep neural network; one pass learning; speech recognition;

机译：声学;神经网络;优化;语音;语音识别;随机过程;培训;异步平均SGD;深神经网络;一个通过学习;语音识别;

相似文献

外文文献
中文文献
专利

1. Non-convergence of stochastic gradient descent in the training of deep neural networks [J] . Cheridito Patrick, Jentzen Arnulf, Rossmannek Florian Journal of complexity . 2021,第Juna期

机译：深神经网络训练中随机梯度下降的非融合
2. Accelerating deep neural network training with inconsistent stochastic gradient descent [J] . Wang Linnan, Yang Yi, Min Renqiang, Neural Networks: The Official Journal of the International Neural Network Society . 2017,第期

机译：加速深度神经网络训练，随机梯度下降不一致
3. Stochastic Gradient Descent–Whale Optimization Algorithm-Based Deep Convolutional Neural Network To Crowd Emotion Understanding [J] . Avinash Ratre The Computer journal . 2020,第2CD期

机译：基于随机梯度下降-鲸鱼优化算法的深度卷积神经网络对人群情感理解
4. Improving training time of deep neural networkwith asynchronous averaged stochastic gradient descent [C] . You Zhao, Xu Bo International Symposium on Chinese Spoken Language Processing . 2014

机译：异步平均随机梯度下降法缩短了深度神经网络的训练时间
5. Averaging Projected Stochastic Gradient Descent for large scale least square problem. [D] . Mu, Yang. 2012

机译：对大型最小二乘问题平均投影随机梯度下降。
6. Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks [O] . Shrihari Vasudevan 2020

机译：基于互动信息的学习速率衰减用于深神经网络的随机梯度血统训练
7. GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training [O] . Paine, Thomas, Jin, Hailin, Yang, Jianchao, 2013

机译：GpU异步随机梯度下降加速神经网络训练

Improving training time of deep neural networkwith asynchronous averaged stochastic gradient descent

摘要

著录项

相似文献

相关主题

期刊订阅