Improved Language Modeling by Decoding the Past

机译：通过解码过去改进语言建模

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Highly regularized LSTMs achieve impressive results on several benchmark datasets in language modeling. We propose a new regu-larization method based on decoding the last token in the context using the predicted distribution of the next token. This biases the model towards retaining more contextual information, in turn improving its ability to predict the next token. With negligible overhead in the number of parameters and training time, our Past Decode Regularization (PDR) method improves perplexity on the Penn Treebank dataset by up to 1.8 points and by up to 2.3 points on the WikiText-2 dataset, over strong regularized baselines using a single softmax. With a mixture-of-softmax model, we show gains of up to 1.0 perplexity points on these datasets. In addition, our method achieves 1.169 bits-per-character on the Penn Treebank Character dataset for character level language modeling. Each of these results constitute improvements over models without PDR in their respective settings.

机译：高度规范化的LSTM在语言建模的多个基准数据集上取得了令人印象深刻的结果。我们提出了一种新的规则化方法，该方法基于使用下一个标记的预测分布在上下文中解码最后一个标记。这使模型偏向于保留更多上下文信息，从而提高了模型预测下一个标记的能力。与过去相比，在参数数量和训练时间上的开销可以忽略不计的情况下，我们的过去解码正则化（PDR）方法在使用强正则化基线的情况下，可以将Penn Treebank数据集的困惑度提高了1.8点，将WikiText-2数据集的困惑度提高了2.3点一个softmax。使用softmax-of-softmax模型，我们在这些数据集上显示了多达1.0个困惑点的收益。此外，我们的方法在Penn Treebank字符数据集上实现了每个字符1.169位的字符级语言建模。这些结果中的每一个都构成了对在各自设置中没有PDR的模型的改进。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|1468-1476|共9页
会议地点
作者
Siddhartha Brahma;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Improved Modeling of Cross-Decoder Phone Co-Occurrences in SVM-Based Phonotactic Language Recognition [J] . Penagarikano M., Varona A., Rodriguez-Fuentes L. J., Audio, Speech, and Language Processing, IEEE Transactions on . 2011,第8期

机译：基于支持向量机的语音策略语言识别中跨解码器电话共现的改进建模
2. Efficient Embedded Decoding of Neural Network Language Models in a Machine Translation System [J] . Francisco Zamora-Martinez, Maria Jose Castro-Bleda International Journal of Neural Systems . 2018,第9期

机译：高效嵌入式解码机器翻译系统中的神经网络语言模型
3. Comparison of Performance of Enhanced Morpheme-based Language Model with Different Word-based Language Models for Improving the Performance of Tamil Speech Recognition System [J] . S. SARASWATHI, T.V. GEETHA ACM transactions on Asian language information processing . 2007,第3期

机译：增强的基于词素的语言模型与不同的基于单词的语言模型的性能比较，以提高泰米尔语语音识别系统的性能
4. Improved Language Modeling by Decoding the Past [C] . Siddhartha Brahma Annual meeting of the Association for Computational Linguistics . 2019

机译：通过解码过去改进语言建模
5. Does L2 word decoding imply L2 meaning activation? Relationships among decoding, meaning identification, and L2 oral language proficiency in reading Spanish as a second language. [D] . Saiz, Marina. 2007

机译：L2字解码是否暗示L2意思激活？阅读西班牙语作为第二语言时，解码，含义识别和第二语言口语能力之间的关系。
6. A new decoding algorithm for hidden Markov models improves the prediction of the topology of all-beta membrane proteins [O] . Piero Fariselli, Pier Luigi Martelli, Rita Casadio 2005

机译：隐马尔可夫模型的新解码算法改善了全β膜蛋白拓扑结构的预测
7. Improved Language Modeling by Decoding the Past [O] . Siddhartha Brahma 2019

机译：通过解码过去改进语言建模

Improved Language Modeling by Decoding the Past

摘要

著录项

相似文献

相关主题

期刊订阅