首页> 外文期刊>Speech Communication >Modeling word-level rate-of-speech variation in large vocabulary conversational speech recognition
【24h】

Modeling word-level rate-of-speech variation in large vocabulary conversational speech recognition

机译:大型词汇会话语音识别中的词级语音变化率建模

获取原文
获取原文并翻译 | 示例
           

摘要

Variations in rate-of-speech (ROS) produce variations in both spectral features and word pronunciations that affect automatic speech recognition systems. To deal with these ROS effects, we propose to use a set of parallel rate-specific acoustic and pronunciation models. Rate switching is permitted at word boundaries, to allow within-sentence speech rate variation, which is common in conversational speech. Because of the parallel structure of rate-specific models and the maximum likelihood decoding method, our approach does not require ROS estimation before recognition, which is hard to achieve. We evaluate our models on a large vocabulary conversational speech recognition task over the telephone. Experiments on the NIST 2000 Hub-5 development set show that word-level ROS-dependent modeling results in a 2.2% absolute reduction in word error rate over a rate-independent baseline system. Relative to an enhanced baseline system that models cross-word phonetic elision and reduction in a multiword dictionary, rate-dependent models achieve an absolute improvement of 1.5%. Furthermore, we introduce a novel method to modeling reduced pronunciations that are common in fast speech based on the approach of skipping short phones in the pronunciation models while preserving the phonetic context for the adjacent phones. This method is shown to also produce a small additional improvement on top of ROS-dependent acoustic modeling.
机译:语音速率(ROS)的变化会产生频谱特征和单词发音的变化,从而影响自动语音识别系统。为了处理这些ROS效应,我们建议使用一组并行的特定于速率的声学和发音模型。允许在单词边界处进行速率切换,以实现句子内语音速率变化,这在会话语音中很常见。由于速率特定模型和最大似然解码方法的并行结构,我们的方法不需要在识别之前就进行ROS估计,这很难实现。我们通过电话对大型词汇会话语音识别任务进行评估。在NIST 2000 Hub-5开发集中进行的实验表明,与单词速率无关的基准系统相比,单词级别的ROS依赖建模导致单词错误率的绝对减少2.2%。相对于增强的基准系统,该基准系统对多词词典中的跨词语音省略和减少进行建模,速率相关模型可实现1.5%的绝对改善。此外,我们基于在语音模型中跳过短电话的同时保留相邻电话的语音环境的方法,介绍了一种新颖的方法来对快速语音中常见的减少的语音进行建模。在依赖ROS的声学模型的基础上,该方法还显示出了小的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号