首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Improved Language Modeling by Decoding the Past
【24h】

Improved Language Modeling by Decoding the Past

机译:通过解码过去改进语言建模

获取原文

摘要

Highly regularized LSTMs achieve impressive results on several benchmark datasets in language modeling. We propose a new regu-larization method based on decoding the last token in the context using the predicted distribution of the next token. This biases the model towards retaining more contextual information, in turn improving its ability to predict the next token. With negligible overhead in the number of parameters and training time, our Past Decode Regularization (PDR) method improves perplexity on the Penn Treebank dataset by up to 1.8 points and by up to 2.3 points on the WikiText-2 dataset, over strong regularized baselines using a single softmax. With a mixture-of-softmax model, we show gains of up to 1.0 perplexity points on these datasets. In addition, our method achieves 1.169 bits-per-character on the Penn Treebank Character dataset for character level language modeling. Each of these results constitute improvements over models without PDR in their respective settings.
机译:高度规范化的LSTM在语言建模的多个基准数据集上取得了令人印象深刻的结果。我们提出了一种新的规则化方法,该方法基于使用下一个标记的预测分布在上下文中解码最后一个标记。这使模型偏向于保留更多上下文信息,从而提高了模型预测下一个标记的能力。与过去相比,在参数数量和训练时间上的开销可以忽略不计的情况下,我们的过去解码正则化(PDR)方法在使用强正则化基线的情况下,可以将Penn Treebank数据集的困惑度提高了1.8点,将WikiText-2数据集的困惑度提高了2.3点一个softmax。使用softmax-of-softmax模型,我们在这些数据集上显示了多达1.0个困惑点的收益。此外,我们的方法在Penn Treebank字符数据集上实现了每个字符1.169位的字符级语言建模。这些结果中的每一个都构成了对在各自设置中没有PDR的模型的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号