首页> 外文会议>INTERSPEECH 2012 >Investigation of Maximum Entropy Hybrid Language Models for Open Vocabulary German and Polish LVCSR
【24h】

Investigation of Maximum Entropy Hybrid Language Models for Open Vocabulary German and Polish LVCSR

机译:开放词汇德语与波兰LVCSR的最大熵混合语言模型的研究

获取原文

摘要

For languages like German and Polish, higher numbers of word inflections lead to high out-of-vocabulary (OOV) rates and high language model (LM) perplexities. Thus, one of the main challenges in large vocabulary continuous speech recognition (LVCSR) is recognizing an open vocabulary. In this paper, we investigate the use of mixed type of sub-word units in the same recognition lexicon. Namely, morphemic or syllabic units combined with pronunciations called graphones, normal graphemic morphemes or syllables, along with full-words. In addition, we investigate the suitability of hybrid mixed-unit N-grarris as features for Maximum Entropy LM along with adaptation. We achieve significant improvements in recognizing OOVs and word error rate reductions for German and Polish LVCSR compared to the conventional full-word approach and state-of-the-art N-gram mixed
机译:对于德国和波兰语如语言,较高数量的词拐点导致高词汇(OOV)率和高语言模型(LM)困惑。因此,大词汇连续语音识别(LVCSR)中的主要挑战之一是识别开放的词汇。在本文中,我们调查在同一识别词典中的混合类型的子字单元的使用。即,语音学或音节单位结合发音,称为Graphones,普通的图形语素或音节以及全文。此外,我们还研究了混合混合单元N-Grarris作为最大熵LM的特征的适用性。与传统的全文方法和最先进的n克混合相比,我们在识别德国和波兰LVCSR的OOV和Word错误率减少方面取得了重大改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号