首页> 外文会议>International Conference on Text, Speech and Dialogue >Syllable Based Language Model for Large Vocabulary Continuous Speech Recognition of Polish
【24h】

Syllable Based Language Model for Large Vocabulary Continuous Speech Recognition of Polish

机译:基于音节的语言模型,用于抛光的大词汇连续语音识别

获取原文

摘要

Most of state-of-the-art large vocabulary continuous speech recognition systems use word-based n-gram language models. Such models are not optimal solution for inflectional or agglutinative languages. The Polish language is highly inflectional one and requires a very large corpora to create a sufficient language model with the small out-of-vocabulary ratio. We propose a syllable-based language model, which is better suited to highly inflectional language like Polish. In case of lack of resources (i.e. small corpora) syllable-based model outperforms word-based models in terms of number of out-of-vocabulary units (syllables in our model). Such model is an approximation of the morpheme-based model for Polish. In our paper, we show results of evaluation of syllable based model and its usefulness in speech recognition tasks.
机译:大多数最先进的大型词汇连续语音识别系统使用基于Word的N-Gram语言模型。这些模型不是对折射或凝集语言的最佳解决方案。波兰语是高度折的拐点,需要一个非常大的Corpora创造出足够的语言模型,具有较小的词汇比例。我们提出了一种基于音节的语言模型,这更适合高度折衷的语言,如波兰语。在缺乏资源(即,小型语料库)基于音节的模型,在词汇外单位数(我们模型中的音节)方面优于基于词的模型。这种模型是基于语素的抛光模型的近似值。在我们的论文中,我们显示了基于音节的模型评估结果及其在语音识别任务中的用途。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号