【24h】

Similar N-gram Language Model

机译:相似的N元语法模型

获取原文

摘要

This paper describes an extension of the n-gram language model: the similar n-gram language model. The estimation of the probability P(s) of a string s by the classical model of order n is computed using statistics of occurrences of the last n words of the string in the corpus, whereas the proposed model further uses all the strings s' for which the Levenshtein distance to s is smaller than a given threshold. The similarity between s and each string s' is estimated using co-occurrence statistics. The new P(s) is approximated by smoothing all the similar n-gram probabilities with a regression technique. A slight but statistically significant decrease in the word error rate is obtained on a state-of-the-art automatic speech recognition system when the similar n-gram language model is interpolated linearly with the n-gram model.
机译:本文介绍了n-gram语言模型的扩展:类似的n-gram语言模型。通过使用语料库中字符串的最后n个单词的出现统计来计算阶次n的经典模型对字符串s的概率P(s)的估计,而所提出的模型进一步将所有字符串s'用于Levenshtein到s的距离小于给定的阈值。 s与每个字符串s'之间的相似性是使用共现统计来估算的。通过使用回归技术对所有相似的n-gram概率进行平滑处理,可以近似得出新的P(s)。当将类似的n-gram语言模型与n-gram模型线性插值时,在最新的自动语音识别系统上,单词错误率会略有下降,但在统计上会显着降低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号