Modeling word-level rate-of-speech variation in large vocabulary conversational speech recognition

Jing Zheng; Horacio Franco; Andreas Stolcke

首页> 外文期刊>Speech Communication >Modeling word-level rate-of-speech variation in large vocabulary conversational speech recognition

【24h】

Modeling word-level rate-of-speech variation in large vocabulary conversational speech recognition

机译：大型词汇会话语音识别中的词级语音变化率建模

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Variations in rate-of-speech (ROS) produce variations in both spectral features and word pronunciations that affect automatic speech recognition systems. To deal with these ROS effects, we propose to use a set of parallel rate-specific acoustic and pronunciation models. Rate switching is permitted at word boundaries, to allow within-sentence speech rate variation, which is common in conversational speech. Because of the parallel structure of rate-specific models and the maximum likelihood decoding method, our approach does not require ROS estimation before recognition, which is hard to achieve. We evaluate our models on a large vocabulary conversational speech recognition task over the telephone. Experiments on the NIST 2000 Hub-5 development set show that word-level ROS-dependent modeling results in a 2.2% absolute reduction in word error rate over a rate-independent baseline system. Relative to an enhanced baseline system that models cross-word phonetic elision and reduction in a multiword dictionary, rate-dependent models achieve an absolute improvement of 1.5%. Furthermore, we introduce a novel method to modeling reduced pronunciations that are common in fast speech based on the approach of skipping short phones in the pronunciation models while preserving the phonetic context for the adjacent phones. This method is shown to also produce a small additional improvement on top of ROS-dependent acoustic modeling.

机译：语音速率（ROS）的变化会产生频谱特征和单词发音的变化，从而影响自动语音识别系统。为了处理这些ROS效应，我们建议使用一组并行的特定于速率的声学和发音模型。允许在单词边界处进行速率切换，以实现句子内语音速率变化，这在会话语音中很常见。由于速率特定模型和最大似然解码方法的并行结构，我们的方法不需要在识别之前就进行ROS估计，这很难实现。我们通过电话对大型词汇会话语音识别任务进行评估。在NIST 2000 Hub-5开发集中进行的实验表明，与单词速率无关的基准系统相比，单词级别的ROS依赖建模导致单词错误率的绝对减少2.2％。相对于增强的基准系统，该基准系统对多词词典中的跨词语音省略和减少进行建模，速率相关模型可实现1.5％的绝对改善。此外，我们基于在语音模型中跳过短电话的同时保留相邻电话的语音环境的方法，介绍了一种新颖的方法来对快速语音中常见的减少的语音进行建模。在依赖ROS的声学模型的基础上，该方法还显示出了小的改进。

著录项

来源
《Speech Communication》 |2003年第3期|p. 273-285|共13页
作者
Jing Zheng; Horacio Franco; Andreas Stolcke;
展开▼
作者单位

Speech Technology and Research Laboratory, SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025, USA;

Speech Technology and Research Laboratory, SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025, USA;

Speech Technology and Research Laboratory, SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类语言、文字;
关键词
rate-of-speech modeling; large vocabulary conversational speech recognition; pronunciation modeling;

机译：语音速率建模;大词汇量会话语音识别;发音模型;

相似文献

外文文献
中文文献
专利

1. Creating word-level language models for large-vocabulary handwriting recognition [J] . John F. Pitrelli, Amit Roy International Journal on Document Analysis and Recognition . 2003,第2a3期

机译：创建用于大型词汇手写识别的词级语言模型
2. Automatic Speech Recognition With Very Large Conversational Finnish and Estonian Vocabularies [J] . Seppo Enarvi, Peter Smit, Sami Virpioja, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2017,第11期

机译：具有大量会话芬兰语和爱沙尼亚语词汇的自动语音识别
3. Discriminative Approach to Build Hybrid Vocabulary for Conversational Telephone Speech Recognition of Agglutinative Languages [J] . Xin LI, Jielin PAN, Qingwei ZHAO, IEICE transactions on information and systems . 2013,第11期

机译：会话语言语音识别的混合词汇建立判别方法
4. Effective Acoustic Modeling for Rate-of-Speech Variation in Large Vocabulary Conversational Speech Recognition [C] . Jing Zheng, Horatio Franco, Andreas Stolcke International Conference on Spoken Language Processing; 20041004-08; Jeju(KR) . 2004

机译：大型词汇会话语音识别中语速变化的有效声学建模
5. Pronunciation modeling for conversational speech recognition. [D] . Saraclar, Murat. 2001

机译：用于对话语音识别的语音建模。
6. Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition [O] . Edvin Pakoci, Branislav Popović, Darko Pekar 2019

机译：在塞尔维亚大型词汇语音识别的语言建模中使用形态学数据
7. Hybrid language models for out of vocabulary word detection in large vocabulary conversational speech recognition [O] . Ali Yazgan, Murat Saraclar 2004

机译：用于大词汇量会话语音识别中词汇外单词检测的混合语言模型

Modeling word-level rate-of-speech variation in large vocabulary conversational speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅