首页> 外文会议>Annual conference of the International Speech Communication Association >Investigation of Maximum Entropy Hybrid Language Models for Open Vocabulary German and Polish LVCSR
【24h】

Investigation of Maximum Entropy Hybrid Language Models for Open Vocabulary German and Polish LVCSR

机译:开放词汇德语和波兰语LVCSR的最大熵混合语言模型研究

获取原文

摘要

For languages like German and Polish, higher numbers of word inflections lead to high out-of-vocabulary (OOV) rates and high language model (LM) perplexities. Thus, one of the main challenges in large vocabulary continuous speech recognition (LVCSR) is recognizing an open vocabulary. In this paper, we investigate the use of mixed type of sub-word units in the same recognition lexicon. Namely, morphemic or syllabic units combined with pronunciations called graphones, normal graphemic morphemes or syllables, along with full-words. In addition, we investigate the suitability of hybrid mixed-unit N-grams as features for Maximum Entropy LM along with adaptation. We achieve significant improvements in recognizing OOVs and word error rate reductions for German and Polish LVCSR compared to the conventional full-word approach and state-of-the-art N-gram mixed type hybrid LM.
机译:对于像德语和波兰语这样的语言,更多的词变形会导致较高的词汇率(OOV)和较高的语言模型(LM)困惑。因此,大词汇量连续语音识别(LVCSR)的主要挑战之一是识别开放词汇。在本文中,我们研究了在同一识别词典中混合类型的子单词单元的使用。即,语素或音节单位与被称为graphones,正常graphemic morphemes或音节的发音以及全词结合在一起。此外,我们调查了混合混合单元N元语法作为最大熵LM的特征以及适应性的适用性。与传统的全字词方法和最新的N-gram混合型混合LM相比,我们在识别德语和波兰语LVCSR的OOV和减少字错误率方面取得了显着改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号