首页> 外文OA文献 >Modeling spontaneous speech variability for large vocabulary continuous speech recognition
【2h】

Modeling spontaneous speech variability for large vocabulary continuous speech recognition

机译:为大词汇量连续语音识别建模自发语音变异性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this work a number of novel techniques for improved treatment of spontaneous speech variabilities in large vocabulary automatic speech recognition are developed and evaluated on US English conversational speech and spontaneous medical dictations. Two main aspects of spontaneous speech modeling are addressed: The general handling of pronunciation variability and the individual and parallel treatment of multiple speech variabilities in the acoustic and pronunciation model of a one-pass speech recognizer.The problem of an optimal incorporation of multiple alternative pronunciations into the search framework is addressed in the first part of the thesis. This includes the question of how to efficiently combine the probabilistic contributions of alternative pronunciations in the course of a left to right search procedure. The well known maximum approximation, usually applied in this context, is compared to a novel time synchronous sum approximation technique which integrates alternative pronunciations in a weighted sum of acoustic probabilities. It is shown on a conversational speech task that this approach outperforms the maximum approximation by 2% relative and reduces the search costs by 7%.Another important issue with respect to the incorporation of alternative pronunciations into the search framework is the statistical weighting of the pronunciations. The usually applied pronunciation unigram prior probabilities are typically estimated by the relative frequencies of pronunciations in the training hypotheses. This standard maximum likelihood solution is compared to a novel discriminative training scheme which is an extension of the Discriminative Model Combination technique, proposed in [Beyerlein 01]. The developed iterative reestimation procedure is shown to adjust the influence of a specific pronunciation prior probability in the discriminant function in dependence of (1) the word error rate, (2) the frequency of occurrence of this pronunciation in the correct hypothesis and its rivals, and (3) the underlying acoustic, pronunciation and language model. An evaluation of this technique on a conversational speech task showed a 6.5% relative improvement on the training corpus and a 2% relative gain on an independent test set.The second major part of this thesis addresses the development and evaluation of a novel training and search framework which enables a specific, parallel treatment of multiple speech variabilities in the acoustic and pronunciation model. This technique (1) classifies portions of speech (e.g. words) with respect to given variability classes (e.g. rate of speech), (2) builds class specific acoustic and pronunciation models, and (3) properly combines these models later in the search procedure on a word level basis. A theoretical framework for an efficient integration of the class specific acoustic and pronunciation models into a one-pass search procedure is developed which incorporates contributions from class specific alternatives in a weighted sum of acoustic probabilities. This multi variability framework applies a very general model combination technique which may be applied to combine arbitrary acoustic and pronunciation models on word level. In this work, it is especially used for a parallel, explicit treatment of three important spontaneous speechvariabilities: pronunciation variability, rate of speech variability, and filled pause variability. The best multi variability system combines 6 class specific acoustic and pronunciation models on word level and achieves a word error rate reduction of 13% relative on a highly spontaneous medical dictation task and a gain of 9% relative on conversational speech.
机译:在这项工作中,开发了许多改进的技术来改善大词汇量自动语音识别中的自发性语音变异性,并根据美国英语会话性语音和自发性医学命令对新技术进行了评估。解决了自发语音建模的两个主要方面:在单通语音识别器的声学和发音模型中,对语音变异性的一般处理以及对多种语音变异性的单独和并行处理。论文的第一部分讨论了搜索框架。这包括以下问题:如何在从左到右的搜索过程中有效地组合替代发音的概率贡献。通常将在这种情况下应用的众所周知的最大逼近与一种新颖的时间同步和逼近技术进行比较,该技术将替代发音整合到声学概率的加权和中。在会话语音任务中显示,这种方法相对于最大近似值的效果要好2%,并且可以将搜索成本降低7%。与替代发音合并到搜索框架中有关的另一个重要问题是发音的统计权重。通常通过训练假设中发音的相对频率来估计通常应用的发音单字组先验概率。将此标准最大似然解与新的判别训练方案进行了比较,该方案是[Beyerlein 01]中提出的判别模型组合技术的扩展。所示的开发的迭代重估程序可以根据(1)单词错误率,(2)在正确的假设及其竞争对手中该发音出现的频率来调整特定发音先验概率对判别函数的影响。 (3)基本的声学,发音和语言模型。对会话语音任务的这项技术的评估显示,训练语料库的相对改进为6.5%,而独立测试集的相对改进为2%。本论文的第二部分主要针对新型训练和搜索的开发和评估。框架,可以对声学和发音模型中的多个语音变化进行特定的并行处理。该技术(1)根据给定的可变性类别(例如语音速率)对语音(例如单词)的各个部分进行分类,(2)建立特定于类别的声学和发音模型,并且(3)稍后在搜索过程中适当地组合这些模型在单词级别的基础上。建立了将特定类别的声学和发音模型有效集成到单遍搜索过程的理论框架,该框架将特定类别的替代方法的贡献合并到声学概率的加权和中。这种多变异性框架应用了非常通用的模型组合技术,该技术可以用于在单词级别上组合任意声学和发音模型。在这项工作中,它特别用于三个重要的自发语音变异的并行,显式处理:发音变异,语音变异率和填充的停顿变异。最佳的多变异性系统在单词级别上结合了6类特定的声学和发音模型,相对于高度自发的医学听写任务而言,实现了13%的单词错误率降低,而相对于对话语音的实现则提高了9%。

著录项

  • 作者

    Schramm Hauke;

  • 作者单位
  • 年度 2006
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号