Speech and Language Resources for LVCSR of Russian

机译：俄罗斯LVCSR的言语和语言资源

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A syllable-based language model reduces the lexicon size by hundreds of times. It is especially beneficial in case of highly inflective languages like Russian due to the abundance of word forms according to various grammatical categories. However, the main arising challenge is the concatenation of recognised syllables into the originally spoken sentence or phrase, particularly in the presence of syllable recognition mistakes. Natural fluent speech does not usually incorporate clear information about the outside borders of the spoken words. In this paper a method for the syllable concatenation and error correction is suggested and tested. It is based on the designed co-evolutionary asymptotic probabilistic genetic algorithm for the determination of the most likely sentence corresponding to the recognized chain of syllables within an acceptable time frame. The advantage of this genetic algorithm modification is the minimum number of settings to be manually adjusted comparing to the standard algorithm. Data used for acoustic and language modelling are also described here. A special issue is the preprocessing of the textual data, particularly, handling of abbreviations, Arabic and Roman numerals, since their inflection mostly depends on the context and grammar.

机译：基于音节的语言模型将Lexicon大小减少了数百次。由于根据各种语法类别，如果由于各种语法类别的单词形式的丰富，则特别有益。然而，主要出现的挑战是将公认音节串联成最初说的句子或短语，特别是在音节识别错误的存在。天然流畅的言论通常不会包含有关口语中的外界的清晰信息。在本文中，建议和测试了一种用于音节级联和纠错的方法。它基于设计的共同进化渐近概率遗传学遗传算法，用于确定对应于可接受的时间帧内识别的音节链的最可能句子。该遗传算法修改的优点是与标准算法进行比较的要手动调整的最小设置数。这里还描述了用于声学和语言建模的数据。特殊问题是文本数据的预处理，特别是处理缩写，阿拉伯语和罗马数字，因为它们的变化主要取决于背景和语法。

著录项

来源
《LREC-2012》|2012年||共4页
会议地点
作者
Sergey Zablotskiy; Alexander Shvets; Maxim Sidorov; Eugene Semenkin; Wolfgang Minker;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 41.11083;
关键词
LVCSR; Russian; language modelling; sub-word units;

机译：LVCSR;俄语;语言建模;子字单元;

相似文献

外文文献
中文文献
专利

1. Cross-languages Figurativeness in Translator's Speech (Based on the Russian Translation of Turkish Novel ?The Black Book? by Orhan Pamuk) [J] . Elena A. Yurina, Anastasiya V. Borovkova, Goksel Shenkal Procedia - Social and Behavioral Sciences . 2015,第1期

机译：译者言语中的跨语言比喻性（基于Orhan Pamuk的土耳其小说《黑皮书》的俄语翻译）
2. Cross-languages Figurativeness in Translator's Speech (Based on the Russian Translation of Turkish Novel ?The Black Book? by Orhan Pamuk) [J] . Elena A. Yurina, Anastasiya V. Borovkova, Goksel Shenkal Procedia - Social and Behavioral Sciences . 2015,第1期

机译：译者言语中的跨语言比喻性（基于Orhan Pamuk的土耳其小说《黑皮书》的俄语翻译）
3. Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification [J] . Basu Joyanta, Khan Soma, Roy Rajib, Circuits, systems and signal processing . 2021,第10期

机译：用于扬声器和语言识别的低资源东部和东北印度语言语言的多语种演讲语料库
4. Speech and Language Resources for LVCSR of Russian [C] . Sergey Zablotskiy, Alexander Shvets, Maxim Sidorov, International conference on language resources and evaluation . 2012

机译：俄语LVCSR的语音和语言资源
5. Interlanguage pragmatics in the speech of American second language learners of Russian: Apologies offered by Americans in Russian. [D] . Shardakova, Maria. 2005

机译：美国第二语言俄语学习者演讲中的中介语语用：美国人用俄语提供的道歉。
6. Language development in rural and urban Russian-speaking children with and without developmental language disorder [O] . Sergey A. Kornilov, Tatiana V. Lebedeva, Marina A. Zhukova, -1

机译：有和没有发育性语言障碍的城乡俄语儿童的语言发展
7. TURKISH LVCSR: TOWARDS BETTER SPEECH RECOGNITION FOR AGGLUTINATIVE LANGUAGES [O] . 2008

机译：土耳其LVCsR：对语言语言的更好的语音识别

Speech and Language Resources for LVCSR of Russian

摘要

著录项

相似文献

相关主题

期刊订阅