【24h】

Using Large Corpus N-gram Statistics to Improve Recurrent Neural Language Models

机译:使用大型语料库N元语法统计来改进递归神经语言模型

获取原文

摘要

Recurrent neural network language models (RNNLM) form a valuable foundation for many NLP systems, but training the models can be computationally expensive, and may take days to train on a large corpus. We explore a technique that uses large corpus n-gram statistics as a regularizer for training a neural network LM on a smaller corpus. In experiments with the Billion-Word and Wikitext corpora, we show that the technique is effective, and more time-efficient than simply training on a larger sequential corpus. We also introduce new strategies for selecting the most informative n-grams, and show that these hoost efficiency.
机译:递归神经网络语言模型(RNNLM)为许多NLP系统形成了宝贵的基础,但是训练模型可能在计算上昂贵,并且可能需要花费数天的时间才能训练大型语料库。我们探索了一种使用大语料库n-gram统计数据作为正则化器的技术,用于在较小的语料库上训练神经网络LM。在Billion-Word和Wikitext语料库的实验中,我们证明了该技术有效,并且比简单地在较大的顺序语料库上进行训练更省时。我们还介绍了选择信息量最大的n-gram的新策略,并证明了这些提升效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号