首页> 外文会议>Modeling and Using Context >A New Method Based on Context for Combining Statistical Language Models
【24h】

A New Method Based on Context for Combining Statistical Language Models

机译:基于上下文的统计语言模型组合新方法

获取原文

摘要

In this paper we propose a new method to extract from a corpus the histories for which a given language model is better than another one. The decision is based on a measure stemmed from perplexity. This measure allows, for a given history, to compare two language models, and then to choose the best one for this history. Using this principle, and with a 20K vocabulary words, we combined two language models: a bigram and a distant bigram. The contribution of a distant bigram is significant and outperforms a bigram model by 7.5%. Moreover, the performance in Shannon game are improved. We show through this article that we proposed a cheaper framework in comparison to the maximum entropy principle, for combining language models. In addition, the selected histories for which a model is better than another one, have been collected and studied. Almost, all of them are beginnings of very frequently used French phrases. Finally, by using this principle, we achieve a better trigram model in terms of parameters and perplexity. This model is a combination of a bigram and a trigram based on a selected history.
机译:在本文中,我们提出了一种从语料库中提取给定语言模型优于另一种语言模型的历史的新方法。该决定是基于困惑所产生的。对于给定的历史记录,此度量可以比较两种语言模型,然后为该历史记录选择最佳的语言模型。使用此原理,并使用20K的词汇量,我们组合了两种语言模型:双字和远距双字。远处的二元模型的贡献显着,并且比二元模型高出7.5%。此外,香农游戏中的性能得到了改善。通过本文,我们证明了与最大熵原理相比,我们提出了一种更便宜的框架,用于组合语言模型。另外,已经收集并研究了模型优于另一种模型的选定历史记录。几乎所有这些都是非常常用的法语短语的开头。最后,通过使用此原理,我们在参数和困惑度方面获得了更好的Trigram模型。该模型是基于选定历史记录的二元组和三元组的组合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号