【24h】

RNNDROP: A novel dropout for RNNS in ASR

机译:RNNDROP:ASR中RNNS的新型退出方式

获取原文

摘要

Recently, recurrent neural networks (RNN) have achieved the state-of-the-art performance in several applications that deal with temporal data, e.g., speech recognition, handwriting recognition and machine translation. While the ability of handling long-term dependency in data is the key for the success of RNN, combating over-fitting in training the models is a critical issue for achieving the cutting-edge performance particularly when the depth and size of the network increase. To that end, there have been some attempts to apply the dropout, a popular regularization scheme for the feed-forward neural networks, to RNNs, but they do not perform as well as other regularization scheme such as weight noise injection. In this paper, we propose rnnDrop, a novel variant of the dropout tailored for RNNs. Unlike the existing methods where dropout is applied only to the non-recurrent connections, the proposed method applies dropout to the recurrent connections as well in such a way that RNNs generalize well. Our experiments show that rnnDrop is a better regularization method than others including weight noise injection. Namely, when deep bidirectional long short-term memory (LSTM) RNNs were trained with rnnDrop as acoustic models for phoneme and speech recognition, they significantly outperformed the current state-of-the-arts; we achieved the phoneme error rate of 16.29% on the TIMIT core test set for phoneme recognition and the word error rate of 5.53% on the Wall Street Journal (WSJ) dataset, dev93, for speech recognition, which are the best reported results on both of the datasets.
机译:最近,递归神经网络(RNN)在处理时间数据(例如语音识别,手写识别和机器翻译)的几种应用中取得了最先进的性能。尽管处理数据的长期依赖关系的能力是RNN成功的关键,但在模型训练中克服过度拟合是实现尖端性能的关键问题,尤其是在网络深度和规模增加时。为此,已经进行了一些尝试,以将Dropout(一种用于前馈神经网络的流行正则化方案)应用于RNN,但它们的性能不如权重噪声注入之类的其他正则化方案。在本文中,我们提出了rnnDrop,这是针对RNN量身定制的新型辍学变体。与现有的仅将丢弃应用于非经常性连接的方法不同,所提出的方法也以将RNN很好地泛化的方式将丢弃应用于重复的连接。我们的实验表明,rnnDrop是比包括重量噪声注入在内的其他方法更好的正则化方法。也就是说,当使用rnnDrop训练深双向双向长短期记忆(LSTM)RNN作为音素和语音识别的声学模型时,它们明显优于当前的最新技术。我们在TIMIT核心测试集上实现了用于音素识别的16.29%的音素错误率,在《华尔街日报》(WSJ)数据集dev93上的语音识别中实现了5.53%的单词错误率,这是这两个方面的最佳报告结果数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号