首页> 外文会议>International Symposium on Chinese Spoken Language Processing >Improving training time of deep neural networkwith asynchronous averaged stochastic gradient descent
【24h】

Improving training time of deep neural networkwith asynchronous averaged stochastic gradient descent

机译:异步平均随机梯度下降法缩短了深度神经网络的训练时间

获取原文

摘要

Deep neural network acoustic models have shown large improvement in performance over Gaussian mixture models (GMMs) in recent studies. Typically, stochastic gradient descent (SGD) is the most popular method for training deep neural networks. However, training DNN with minibatch based SGD is very slow. Because it requires frequent serial training and scanning the whole training set many passes before reaching the asymptotic region, making it difficult to scale to large dataset. Commonly, we can reduce training time from two aspects, reducing the epochs of training and exploring the distributed training algorithm. There are some distributed training algorithms, such as LBFGS, Hessian-free optimization and asynchronous SGD, have proven significantly reducing the training time. In order to further reduce the training time, we attempted to explore training algorithm with fast convergence and combined it with distributed training algorithm. Averaged stochastic gradient descent (ASGD) is proved simple and effective for one pass on-line learning. This paper investigates the asynchronous ASGD algorithm for deep neural network training. We tested asynchronous ASGD on the Mandarin Chinese recorded speech recognition task using deep neural networks. Experimental results show that the performance of one pass asynchronous ASGD is very close to that of multiple passes asynchronous SGD. Meanwhile, we can reduce the training time by a factor of 6.3.
机译:在最近的研究中,深层神经网络声学模型已显示出比高斯混合模型(GMM)更好的性能。通常,随机梯度下降(SGD)是训练深度神经网络的最流行方法。但是,使用基于minibatch的SGD训练DNN的速度非常慢。因为它需要频繁的串行训练并在到达渐近区域之前扫描整个训练集许多遍,所以很难扩展到大型数据集。通常,我们可以从两个方面减少训练时间,减少训练的时间和探索分布式训练算法。已经证明了一些分布式训练算法,例如LBFGS,无Hessian优化和异步SGD,可大大减少训练时间。为了进一步减少训练时间,我们尝试探索具有快速收敛性的训练算法,并将其与分布式训练算法相结合。事实证明,平均随机梯度下降(ASGD)对于一次在线学习是简单有效的。本文研究了用于深度神经网络训练的异步ASGD算法。我们使用深度神经网络在普通话录制的语音识别任务上测试了异步ASGD。实验结果表明,单遍异步ASGD的性能非常接近多遍异步SGD的性能。同时,我们可以将培训时间减少6.3倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号