...
首页> 外文期刊>Latin American Applied Research >A new estimator based on maximum entropy
【24h】

A new estimator based on maximum entropy

机译:一种基于最大熵的新估计器

获取原文
           

摘要

In this paper, we propose a new formulation of the classical Good-Turing estimator for n-gram language models. The new approach is based on defining a dynamic model for language production. Instead of assuming a fixed probability distribution of occurrence of an n-gram on the whole text, we propose a maximum entropy approximation of a time varying distribution. This approximation led us to a new distribution, which in turn is used to calculate expectations of the Good-Turing estimator. This defines a new estimator that we call Maximum Entropy Good-Turing estimator. In contrast to the classical Good-Turing estimator, the new formulation needs neither expectations approximations nor windowing or other smoothing techniques. It also contains the well known discounting estimators as special cases. Performance is evaluated both in terms of perplexity and word error rate in an N-best rescoring task. Also comparison to other classical estimators is performed. In all cases our approach performs significantly better than classical estimators.
机译:在本文中,我们为n-gram语言模型提出了经典的Good-Turing估计器的新公式。新方法基于为语言生成定义动态模型的基础。代替假设整个文本上出现n-gram的固定概率分布,我们提出了时变分布的最大熵近似。这种近似导致我们得到了一个新的分布,该分布又用于计算Good-Turing估计量的期望。这定义了一个新的估算器,我们称之为最大熵良好旋转估算器。与经典的Good-Turing估计器相比,新公式既不需要期望逼近,也不需要开窗或其他平滑技术。它还包含作为特殊情况的众所周知的折现估计量。在N最佳评分任务中,会根据困惑度和字错误率对性能进行评估。还进行了与其他经典估计量的比较。在所有情况下,我们的方法都比经典估计器具有更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号