【24h】

Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

机译:递归神经网络的可扩展贝叶斯学习用于语言建模

获取原文

摘要

Recurrent neural networks (RNNs) have shown promising performance for language modeling. However, traditional training of RNNs using back-propagation through time often suffers from overfitting. One reason for this is that stochastic optimization (used for large training sets) does not provide good estimates of model uncertainty. This paper leverages recent advances in stochastic gradient Markov Chain Monte Carlo (also appropriate for large training sets) to learn weight uncertainty in RNNs. It yields a principled Bayesian learning algorithm, adding gradient noise during training (enhancing exploration of the model-parameter space) and model averaging when testing. Extensive experiments on various RNN models and across a broad range of applications demonstrate the superiority of the proposed approach relative to stochastic optimization.
机译:递归神经网络(RNN)在语言建模方面显示出令人鼓舞的性能。但是,传统的使用时间反向传播的RNN训练经常会遭受过度拟合的困扰。原因之一是随机优化(用于大型训练集)不能很好地估计模型不确定性。本文利用随机梯度马尔可夫链蒙特卡洛(也适用于大型训练集)的最新进展来学习RNN中的权重不确定性。它产生了一种有原则的贝叶斯学习算法,在训练过程中增加了梯度噪声(增强了对模型参数空间的探索),并在测试时进行了模型平均。在各种RNN模型上以及广泛应用中的大量实验证明了所提出的方法相对于随机优化的优越性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号