首页> 外文会议>2011 IEEE International Conference on Acoustics, Speech and Signal Processing >Automatically finding semantically consistent n-grams to add new words in LVCSR systems
【24h】

Automatically finding semantically consistent n-grams to add new words in LVCSR systems

机译:自动查找语义一致的n元语法以在LVCSR系统中添加新单词

获取原文

摘要

This paper presents a new method to automatically add n-grams containing out-of-vocabulary (OOV) words to a baseline language model (LM), where these n-grams are sought to be grammatically correct and to make sense according to the meaning of OOV words. First, this method consists in determining the word sequences, i.e., n-grams, in which the usage of a given OOV word is the most semantically consistent. Then, conditional probabilities of these n-grams have to be computed. To do this, semantic relations between words are used to assimilate each OOV word to several equivalent in-vocabulary words. Based on these last words, n-grams from the baseline LM are re-used to find the word sequences to be added and to compute their probabilities. After augmenting the vocabulary and launching a recognition process, experiments show that our method results in WER improvements which are comparable to those obtained using a state-of-the-art open vocabulary LM.
机译:本文提出了一种自动将包含语音(OOV)字词的n-gram添加到基线语言模型(LM)的新方法,在该方法中,这些n-gram寻求语法上的正确性并根据含义有意义OOV单词。首先,该方法在于确定单词序列,即n-gram,其中给定OOV单词的使用在语义上最一致。然后,必须计算这些n-gram的条件概率。为此,使用单词之间的语义关系将每个OOV单词同化为几个等效的词汇中单词。基于这些最后的单词,基线LM的n-gram被重复使用,以找到要添加的单词序列并计算其概率。在增加词汇量并启动识别过程之后,实验表明,我们的方法所产生的WER改进与使用最新的开放式词汇表LM所获得的改进相当。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号