Exploring one pass learning for deep neural network training with averaged stochastic gradient descent

机译：探索具有平均随机梯度下降的深度神经网络训练的一遍学习

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Deep neural network acoustic models have shown large improvement in performance over Gaussian mixture models (G-MMs) in recent studies. Typically, deep neural networks are trained based on the cross-entropy criterion using stochastic gradient descent (SGD). However, plain SGD requires scanning the whole training set many passes before reaching the asymptotic region, making it difficult to scale to large dataset. It has been established that the second order SGD can potentially reach its asymptotic region in one pass through the training dataset. However, since it involves expensive computing for the inverse of Hessian matrix in the loss function, its application is limited. Averaged stochastic gradient descent (ASGD) is proved simple and effective for one pass online learning. This paper investigates the ASGD algorithm for deep neural network training. We tested ASGD on the Mandarin Chinese record speech recognition task using deep neural networks. Experimental results show that the performance of one pass ASGD is very close to that of multiple passes SGD.

机译：在最近的研究中，深层神经网络声学模型已显示出比高斯混合模型（G-MM）更好的性能。通常，使用随机梯度下降（SGD）根据交叉熵准则训练深度神经网络。但是，普通SGD需要在到达渐近区域之前扫描整个训练集许多遍，这使得很难扩展到大型数据集。已经确定，二阶SGD可以一次通过训练数据集而潜在地到达其渐近区域。但是，由于损失函数中的黑森矩阵的逆运算涉及昂贵的计算，因此其应用受到限制。事实证明，平均随机梯度下降（ASGD）对于一次在线学习是简单有效的。本文研究了用于深度神经网络训练的ASGD算法。我们使用深度神经网络在普通话记录语音识别任务中测试了ASGD。实验结果表明，一次通过ASGD的性能非常接近多次通过SGD的性能。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2014年|6854-6858|共5页
会议地点
作者
You Zhao; Wang Xiaorui; Xu Bo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
averaged stochastic gradient descent; deep neural network; one pass learning; speech recognition;

机译：平均随机梯度下降;深度神经网络一站式学习;语音识别;

相似文献

外文文献
中文文献
专利

1. Non-convergence of stochastic gradient descent in the training of deep neural networks [J] . Cheridito Patrick, Jentzen Arnulf, Rossmannek Florian Journal of complexity . 2021,第Juna期

机译：深神经网络训练中随机梯度下降的非融合
2. Accelerating deep neural network training with inconsistent stochastic gradient descent [J] . Wang Linnan, Yang Yi, Min Renqiang, Neural Networks: The Official Journal of the International Neural Network Society . 2017,第期

机译：加速深度神经网络训练，随机梯度下降不一致
3. Layer-wise learning based stochastic gradient descent method for the optimization of deep convolutional neural network [J] . Zheng Qinghe, Tian Xinyu, Jiang Nan, Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2019,第4aPta2期

机译：基于层性学习的随机梯度渐变方法，用于优化深卷积神经网络
4. EXPLORING ONE PASS LEARNING FOR DEEP NEURAL NETWORK TRAINING WITH AVERAGED STOCHASTIC GRADIENT DESCENT [C] . Zhao You, Xiaorui Wang, Bo Xu IEEE International Conference on Acoustics, Speech and Signal Processing . 2014

机译：探索平均随机梯度下降深度神经网络训练的一移学习
5. An Investigation of Stochastic Gradient Descent Dynamics of Neural Networks [D] . Luo, Victor. 2021

机译：神经网络随机梯度下降动力学研究
6. Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks [O] . Shrihari Vasudevan 2020

机译：基于互动信息的学习速率衰减用于深神经网络的随机梯度血统训练
7. Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks [O] . Shrihari Vasudevan 2020

机译：基于互动信息的学习速率衰减，用于深神经网络的随机梯度血统训练

Exploring one pass learning for deep neural network training with averaged stochastic gradient descent

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅