首页> 外文学位 >Modeling articulatory dynamics using HMM techniques for automatic speech recognition.
【24h】

Modeling articulatory dynamics using HMM techniques for automatic speech recognition.

机译:使用HMM技术对发音动力学进行建模以实现自动语音识别。

获取原文
获取原文并翻译 | 示例

摘要

State-of-the-art speech recognition is accomplished by using stochastic models (Hidden Markov Models or HMMs) to represent small, non-overlapping segments of speech, often referred to as "phonemes". In these conventional HMM speech recognizers, the control strategy does not draw on the underlying structure of speech, but rather models the acoustics as a set of disjoint "segmental" units. Such a strategy does not accommodate the acoustic influence that phonemes have on neighboring phonemes, nor does it attach any meaning to the interval states of the model.;In this work, an alternative HMM control strategy is presented which draws on the idea that the production of speech is a process governed by the mechanical motion of a finite set of relatively slow moving articulators. The Articulatory Feature Model is defined as an HMM in which each internal state of the model represents one point in the (quantized) articulatory space (that is: one possible configuration of the articulatory system). Rather than modeling disjoint acoustic segments, this model represents the acoustic patterns associated with the various articulatory configurations of the speech production system. Instead of a set of small disjoint models, this scheme represents the entire vocabulary with a single, large HMM. Individual vocabulary items are specified as a sequence of target articulatory configurations. The context dependency of phonemes is now explicitly accommodated by those states representing articulatory configurations visited between articulatory targets. The internal model states now have potential real-world interpretation due to their correlation with the physical state of the production system. This allows the incorporation of linguistic and physiological knowledge to restrict the model evolution and improve performance. The development of the quantized articulatory space, target articulatory feature sequences, and feature evolution constraints for a large-vocabulary speech recognition system are presented. Recognition results are presented for both small and large vocabulary tasks, showing that the articulatory feature scheme is competitive with the traditional phoneme model (offering roughly 10% decrease in error rate over the phoneme model). Analysis of the model's behaviour indicates how model designers are able to capitalize on the physical interpretation of internal model states.
机译:通过使用随机模型(隐马尔可夫模型或HMM)表示较小的,不重叠的语音片段(通常称为“音素”),可以实现最先进的语音识别。在这些常规的HMM语音识别器中,控制策略不依靠语音的基础结构,而是将声学建模为一组不相交的“分段”单元。这种策略不能适应音素对相邻音素的声学影响,也不能对模型的间隔状态赋予任何意义。在这项工作中,提出了一种替代的HMM控制策略,该策略借鉴了生产的思想。语音转换是由有限组相对缓慢移动的咬合架的机械运动控制的过程。关节特征模型被定义为HMM,其中模型的每个内部状态代表(量化的)关节运动空间中的一个点(即:关节系统的一种可能的配置)。该模型不是对不连续的声学段进行建模,而是表示与语音生成系统的各种发音配置相关的声学模式。该方案代替了一组小的不相交的模型,而是用一个大的HMM表示整个词汇表。单个词汇项目被指定为目标发音配置的序列。音素的上下文相关性现在由表示在发音目标之间访问的发音配置的那些状态明确容纳。由于内部模型状态与生产系统的物理状态相关,因此它们现在具有潜在的真实世界解释。这允许语言和生理知识的整合,以限制模型的发展并提高性能。提出了量化发音空间,目标发音特征序列以及大型语音识别系统的特征演化约束条件的发展。提出的识别结果适用于大小词汇量任务,表明发音特征方案与传统音素模型相比具有竞争优势(错误率比音素模型降低了约10%)。对模型行为的分析表明模型设计者如何能够利用内部模型状态的物理解释。

著录项

  • 作者

    Erler, Kevin J.;

  • 作者单位

    University of Waterloo (Canada).;

  • 授予单位 University of Waterloo (Canada).;
  • 学科 Engineering Electronics and Electrical.;Computer Science.;Artificial Intelligence.
  • 学位 Ph.D.
  • 年度 1994
  • 页码 215 p.
  • 总页数 215
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号