首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Exploiting different word clusterings for class-based RNN language modeling in speech recognition
【24h】

Exploiting different word clusterings for class-based RNN language modeling in speech recognition

机译:在语音识别中为基于类的RNN语言建模开发不同的单词聚类

获取原文

摘要

We propose to exploit the potential of multiple word clusterings in class-based recurrent neural network (RNN) language models for ensemble RNN language modeling. By varying the clustering criteria and the space of word embedding, different word clusterings are obtained to define different word/class factorizations. For each such word/class factorization, several base RNNLMs are learned, and the word prediction probabilities of the base RNNLMs are then combined to form an ensemble prediction. We use a greedy backward model selection procedure to select a subset of models and combine these models for word prediction. The proposed ensemble language modeling method has been evaluated on Penn Treebank test set as well as Wall Street Journal (WSJ) Eval 92 and 93 test sets, where it improved test set perplexity and word error rate over the state-of-the-art single RNNLMs as well as multiple RNNLMs produced by varying RNN learning conditions.
机译:我们建议在基于类的递归神经网络(RNN)语言模型中开发多个单词聚类的潜力,以进行整体RNN语言建模。通过改变聚类标准和词嵌入的空间,可以获得不同的词聚类以定义不同的词/类分解。对于每个这样的词/类分解,学习了几个基本的RNNLM,然后将基本的RNNLM的词预测概率组合起来以形成整体预测。我们使用贪婪的向后模型选择过程来选择模型的子集,并将这些模型组合在一起进行单词预测。在Penn Treebank测试集以及《华尔街日报》(WSJ)Eval 92和93测试集上对所提出的集成语言建模方法进行了评估,在此方法上,它比最新的测试集提高了测试集的困惑度和字错误率RNNLM以及通过不同的RNN学习条件产生的多个RNNLM。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号