首页> 外文会议>International conference on spoken language processing >Practical Language Modeling: An Interpolating Method
【24h】

Practical Language Modeling: An Interpolating Method

机译:实用语言建模:插值方法

获取原文

摘要

Language modeling is a key component in speech and handwriting recognition. N-gram language modeling is used as the formalism of choice for a wide range of domains. Although a high order N can reduce perplexity greatly, it is unrealistic in many practical cases to get statistically reliable N-grams. We propose an interpolated model by introducing signal words and clue words in to the baseline N-gram model. The initial word in a word pair with high mutual information is chosen as a signal word. In the same way, we defien such words that have high mutual information with a certain morpholgocial form as clue words. I na given context, we select a signal word with the highest score to compute the probability of the current word, and a clue word with the highest score to estiamte the probability of the form of the current word. We discuss the basic requirements of designing an interpolating language model and see how our models satisfy the requirements. We got considerable reduction in perplexity, compared to the baseline model. Because both signal words and clue words are easy to collect and handle, the proposed mehtod is practical.
机译:语言建模是语音和手写识别中的关键组件。 n-gram语言建模用作各种域的选择的形式主义。虽然高阶n可以大大减少困惑,但在许多实际情况下,它是不现实的,以获得统计上可靠的n-grams。我们通过将信号单词和线索词引入基线n​​-gram模型来提出内插模型。选择具有高互信息的单词对中的初始单词作为信号字。以同样的方式,我们解除具有高相互信息的单词,与某种形式形式为线索词。我给出了上下文,我们选择了最高分的信号字来计算当前单词的概率,以及最高分的线索,以eStiamte是当前字的形式的概率。我们讨论设计内插语模型的基本要求,了解我们的模型如何满足要求。与基线模型相比,我们的困惑有很大的减少。因为信号词和线索单词都很容易收集和处理,所提出的Mehtod是实用的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号