首页> 外文会议>International Symposium on Chinese Spoken Language Processing >Improving training time of deep neural networkwith asynchronous averaged stochastic gradient descent
【24h】

Improving training time of deep neural networkwith asynchronous averaged stochastic gradient descent

机译:改善深神经网络的培训时间,With异步平均随机梯度下降

获取原文

摘要

Deep neural network acoustic models have shown large improvement in performance over Gaussian mixture models (GMMs) in recent studies. Typically, stochastic gradient descent (SGD) is the most popular method for training deep neural networks. However, training DNN with minibatch based SGD is very slow. Because it requires frequent serial training and scanning the whole training set many passes before reaching the asymptotic region, making it difficult to scale to large dataset. Commonly, we can reduce training time from two aspects, reducing the epochs of training and exploring the distributed training algorithm. There are some distributed training algorithms, such as LBFGS, Hessian-free optimization and asynchronous SGD, have proven significantly reducing the training time. In order to further reduce the training time, we attempted to explore training algorithm with fast convergence and combined it with distributed training algorithm. Averaged stochastic gradient descent (ASGD) is proved simple and effective for one pass on-line learning. This paper investigates the asynchronous ASGD algorithm for deep neural network training. We tested asynchronous ASGD on the Mandarin Chinese recorded speech recognition task using deep neural networks. Experimental results show that the performance of one pass asynchronous ASGD is very close to that of multiple passes asynchronous SGD. Meanwhile, we can reduce the training time by a factor of 6.3.
机译:深度神经网络声学模型表现出在最近的研究中对高斯混合模型(GMMS)的性能进行了大的改进。通常,随机梯度下降(SGD)是训练深神经网络的最常用的方法。但是,培训具有基于小靶的SGD的DNN非常慢。因为它需要频繁的串行训练和扫描整个训练,在到达渐近区域之前,许多通过许多通过,使得难以扩展到大型数据集。通常,我们可以减少两个方面的培训时间,减少培训时期的时期和探索分布式训练算法。有一些分布式训练算法,如LBFG,无Hessian--Use Optimization和异步SGD,已证明显着降低了培训时间。为了进一步减少培训时间,我们试图利用分布式训练算法探索快速收敛的培训算法,并将其与分布式训练算法组合。对一个传递在线学习证明了平均随机梯度下降(ASGD)简单有效。本文研究了深度神经网络训练的异步ASGD算法。我们使用深神经网络在普通话中录制的语音识别任务上测试了异步ASGD。实验结果表明,一种传递异步ASGD的性能非常接近多次通过异步SGD的近似。同时,我们可以将培训时间减少为6.3倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号