【24h】

Within and Across Sentence Boundary Language Model

机译:句子边界语言内部和跨语言模型

获取原文

摘要

In this paper, we propose two different language modeling approaches, namely skip trigram and across sentence boundary, to capture the long range dependencies. The skip trigram model is able to cover more predecessor words of the present word compared to the normal trigram while the same memory space is required. The across sentence boundary model uses the word distribution of the previous sentences to calculate the unigram probability which is applied as the emission probability in the word and the class model frameworks. Our experiments on the Penn Treebank [1] show that each of our proposed models and also their combination significantly outperform the baseline for both the word and the class models and their linear interpolation. The linear interpolation of the word and the class models with the proposed skip trigram and across sentence boundary models achieves 118.4 perplexity while the best state-of-the-art language model has a perplexity of 137.2 on the same dataset.
机译:在本文中,我们提出了两种不同的语言建模方法,即跳过三字组和跨句子边界,以捕获远程依赖关系。与正常的trigram相比,skip trigram模型能够覆盖当前单词的更多前代单词,同时需要相同的存储空间。跨句子边界模型使用先前句子的单词分布来计算字母组合概率,该单词组合概率将在单词和类模型框架中用作发射概率。我们在Penn Treebank [1]上的实验表明,我们提出的每个模型及其组合都明显优于单词和类模型及其线性插值的基线。使用建议的跳过三字母组和跨句子边界模型对单词和类别模型进行线性插值可达到118.4的困惑度,而最佳的最新语言模型在同一数据集上的困惑度为137.2。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号