Using Large Corpus N-gram Statistics to Improve Recurrent Neural Language Models

机译：使用大型语料库N元语法统计来改进递归神经语言模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recurrent neural network language models (RNNLM) form a valuable foundation for many NLP systems, but training the models can be computationally expensive, and may take days to train on a large corpus. We explore a technique that uses large corpus n-gram statistics as a regularizer for training a neural network LM on a smaller corpus. In experiments with the Billion-Word and Wikitext corpora, we show that the technique is effective, and more time-efficient than simply training on a larger sequential corpus. We also introduce new strategies for selecting the most informative n-grams, and show that these hoost efficiency.

机译：递归神经网络语言模型（RNNLM）为许多NLP系统形成了宝贵的基础，但是训练模型可能在计算上昂贵，并且可能需要花费数天的时间才能训练大型语料库。我们探索了一种使用大语料库n-gram统计数据作为正则化器的技术，用于在较小的语料库上训练神经网络LM。在Billion-Word和Wikitext语料库的实验中，我们证明了该技术有效，并且比简单地在较大的顺序语料库上进行训练更省时。我们还介绍了选择信息量最大的n-gram的新策略，并证明了这些提升效率。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2019年|3268-3273|共6页
会议地点
作者
Yiben Yang; Ji-Ping Wang; Doug Downey;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An empirical study of statistical language models: n-gram language models vs. neural network language models [J] . Freha Mezzoudj, Abdelkader Benyettou International Journal of Innovative Computing and Applications . 2018,第4期

机译：统计语言模型的实证研究：n-gram语言模型与神经网络语言模型
2. Converting Continuous-Space Language Models into N-gram Language Models with Efficient Bilingual Pruning for Statistical Machine Translation [J] . RUI WANG, MASAO UTIYAMA, ISAO GOTO, ACM transactions on Asian language information processing . 2016,第3期

机译：通过高效的双语修剪将连续空间语言模型转换为N-gram语言模型以进行统计机器翻译
3. A survey on the application of recurrent neural networks to statistical language modeling [J] . Wim De Mulder, Steven Bethard, Marie-Francine Moens Computer speech and language . 2015,第1期

机译：递归神经网络在统计语言建模中的应用研究
4. Using Large Corpus N-gram Statistics to Improve Recurrent Neural Language Models [C] . Yiben Yang, Ji-Ping Wang, Doug Downey Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2019

机译：使用大型语料库N-GRAM统计数据来提高经常性神经语言模型
5. Language-independent text learning with statistical n-gram language models. [D] . Peng, Fuchun. 2003

机译：统计n-gram语言模型的独立于语言的文本学习。
6. Modeling Actions of PubMed Users with N-Gram Language Models [O] . Jimmy Lin, W. John Wilbur -1

机译：N-Gram语言模型对PubMed用户的建模动作
7. WAYS TO IMPROVE N-GRAM LANGUAGE MODELS FOR OCR AND SPEECH RECOGNITION OF SLAVIC LANGUAGES [O] . Volume Issue, V. Taranukha 2015

机译：提高N-GRam语言模型的方法，用于对sLaVIC语言进行OCR和语音识别
8. Investigation of Back-off Based Interpolation Between Recurrent Neural Network and N-gram Language Models (Author's Manuscript). [R] . Chen, X., Liu, X., Gales, M. J. F., 2016

机译：基于回退的递归神经网络与N-gram语言模型的插值研究（作者手稿）。

Using Large Corpus N-gram Statistics to Improve Recurrent Neural Language Models

摘要

著录项

相似文献

相关主题

期刊订阅