Syllable Based Language Model for Large Vocabulary Continuous Speech Recognition of Polish

机译：基于音节的语言模型，用于抛光的大词汇连续语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most of state-of-the-art large vocabulary continuous speech recognition systems use word-based n-gram language models. Such models are not optimal solution for inflectional or agglutinative languages. The Polish language is highly inflectional one and requires a very large corpora to create a sufficient language model with the small out-of-vocabulary ratio. We propose a syllable-based language model, which is better suited to highly inflectional language like Polish. In case of lack of resources (i.e. small corpora) syllable-based model outperforms word-based models in terms of number of out-of-vocabulary units (syllables in our model). Such model is an approximation of the morpheme-based model for Polish. In our paper, we show results of evaluation of syllable based model and its usefulness in speech recognition tasks.

机译：大多数最先进的大型词汇连续语音识别系统使用基于Word的N-Gram语言模型。这些模型不是对折射或凝集语言的最佳解决方案。波兰语是高度折的拐点，需要一个非常大的Corpora创造出足够的语言模型，具有较小的词汇比例。我们提出了一种基于音节的语言模型，这更适合高度折衷的语言，如波兰语。在缺乏资源（即，小型语料库）基于音节的模型，在词汇外单位数（我们模型中的音节）方面优于基于词的模型。这种模型是基于语素的抛光模型的近似值。在我们的论文中，我们显示了基于音节的模型评估结果及其在语音识别任务中的用途。

著录项

来源
《International Conference on Text, Speech and Dialogue》|2008年||共5页
会议地点
作者
Piotr Majewski;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Polish; Large vocabulary continuous speech recognition; Language modeling; Sub-word units; Syllable-based units;

机译：波兰;大词汇连续语音识别;语言建模;子字单元;基于音节的单位;

相似文献

外文文献
中文文献
专利

1. Context-dependent Syllable Modeling of Sentence-based Semi-continuous Speech Recognition for the Tamil Language [J] . Ibralebbe Mohamed Kalith, David Asirvatham, Ismail Raisal Information Technology Journal . 2017,第3期

机译：基于句子的泰米尔语语言半连续语音识别的上下文依赖音节建模
2. A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system [J] . Hyok-Chol Ri International journal of speech technology . 2019,第4期

机译：基于形态统计的音节单位在韩语大词汇量连续语音识别系统中的应用
3. Syllable-based large vocabulary continuous speech recognition [J] . Ganapathiraju A., Hamaker J., Picone J., IEEE Transactions on Speech and Audio Proceessing . 2001,第4期

机译：基于音节的大词汇量连续语音识别
4. Syllable Based Language Model for Large Vocabulary Continuous Speech Recognition of Polish [C] . Piotr Majewski Text, Speech and Dialogue . 2008

机译：波兰语大词汇量连续语音识别的基于音节的语言模型
5. Modeling lexical tones for Mandarin large vocabulary continuous speech recognition. [D] . Lei, Xin. 2006

机译：为普通话大词汇量连续语音识别建模词汇声调。
6. Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition [O] . Edvin Pakoci, Branislav Popović, Darko Pekar 2019

机译：在塞尔维亚大型词汇语音识别的语言建模中使用形态学数据
7. Discriminative Language Model With Part-of-speech for Mandarin Large Vocabulary Continuous Speech Recognition System [O] . Yujing Si, Zhen Zhang, Qingqing Zhang, 2013

机译：用于普通话大词汇连续语音识别系统的辨别语言模型

Syllable Based Language Model for Large Vocabulary Continuous Speech Recognition of Polish

摘要

著录项

相似文献

相关主题

期刊订阅