首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Exploring one pass learning for deep neural network training with averaged stochastic gradient descent
【24h】

Exploring one pass learning for deep neural network training with averaged stochastic gradient descent

机译:探索具有平均随机梯度下降的深度神经网络训练的一遍学习

获取原文
获取外文期刊封面目录资料

摘要

Deep neural network acoustic models have shown large improvement in performance over Gaussian mixture models (G-MMs) in recent studies. Typically, deep neural networks are trained based on the cross-entropy criterion using stochastic gradient descent (SGD). However, plain SGD requires scanning the whole training set many passes before reaching the asymptotic region, making it difficult to scale to large dataset. It has been established that the second order SGD can potentially reach its asymptotic region in one pass through the training dataset. However, since it involves expensive computing for the inverse of Hessian matrix in the loss function, its application is limited. Averaged stochastic gradient descent (ASGD) is proved simple and effective for one pass online learning. This paper investigates the ASGD algorithm for deep neural network training. We tested ASGD on the Mandarin Chinese record speech recognition task using deep neural networks. Experimental results show that the performance of one pass ASGD is very close to that of multiple passes SGD.
机译:在最近的研究中,深层神经网络声学模型已显示出比高斯混合模型(G-MM)更好的性能。通常,使用随机梯度下降(SGD)根据交叉熵准则训练深度神经网络。但是,普通SGD需要在到达渐近区域之前扫描整个训练集许多遍,这使得很难扩展到大型数据集。已经确定,二阶SGD可以一次通过训练数据集而潜在地到达其渐近区域。但是,由于损失函数中的黑森矩阵的逆运算涉及昂贵的计算,因此其应用受到限制。事实证明,平均随机梯度下降(ASGD)对于一次在线学习是简单有效的。本文研究了用于深度神经网络训练的ASGD算法。我们使用深度神经网络在普通话记录语音识别任务中测试了ASGD。实验结果表明,一次通过ASGD的性能非常接近多次通过SGD的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号