首页> 外文学位 >Optimal generative and discriminative acoustic model training for speech recognition.
【24h】

Optimal generative and discriminative acoustic model training for speech recognition.

机译:用于语音识别的最佳生成和判别声学模型训练。

获取原文
获取原文并翻译 | 示例

摘要

The focus of this dissertation is to derive and demonstrate effective stochastic models for the speech recognition problem. Acoustic modeling for speech recognition typically involves representing the speech process within stochastic models. Modeling this high frequency time series effectively is a fundamental problem.The thesis proposes two such models that are developed to optimize the devised objective function. The first an acoustic model formulated for the speech with noise problem. The second a discriminately trained model consisting of optimal discriminant ML estimators.The first, a combination of recognizers that through a simple system fusion, combines multiple speech processes at the decision level. This is a stochastic modeling method devised to combine a parameterized spectral missing data, MD, theory based and a cepstral based speech process using a coupled hidden variable topology. In using a fused coupled hidden Markov model, HMM, topology, an optimal acoustic model is proposed that is inherently more robust than single process models under noisy conditions. The theoretical capability of this model is tested under both stationary and non stationary noise conditions. Under these test conditions the fused model has greater recognition accuracies than those of single process models.The second, formulated with a methodology that segments the acoustic space appropriately for discriminately trained models that optimize the devised objective function. This acoustic space is modeled with discriminant ML estimators formed with optimal decision boundaries using the large margin, support vector machine, SVM, learning method. These discriminately trained models maximize the entropy of the observation space and thereby are capable to model the speech process without loss. This is demonstrated experimentally with frame level classification error rates that are &sim &le 3%.This dissertation devises an objective function that relates the true speech distribution to its estimate. It is shown that through optimizing this function the speech process time series can be modeled without loss of information.
机译:本文的重点是为语音识别问题推导和证明有效的随机模型。用于语音识别的声学建模通常涉及在随机模型内表示语音过程。有效地对此高频时间序列进行建模是一个基本问题。本文提出了两个这样的模型,以优化设计的目标函数。第一个为带有噪声问题的语音制定的声学模型。第二个是由最佳判别ML估计量组成的经过区别训练的模型。第一个是通过简单的系统融合将识别器组合在一起的多个识别过程,在决策层结合多个语音过程。这是一种随机建模方法,设计用于使用耦合的隐藏变量拓扑将基于参数化的频谱丢失数据,基于理论的频谱和基于倒谱的语音过程进行组合。在使用融合耦合隐马尔可夫模型,HMM拓扑时,提出了一种最佳声学模型,该模型固有地比在噪声条件下的单过程模型更健壮。在固定和非固定噪声条件下都测试了该模型的理论能力。在这些测试条件下,融合模型比单过程模型具有更高的识别精度。第二,采用针对特定训练模型优化设计目标函数的方法,对声学空间进行适当分割。该声学空间使用判别式ML估计器进行建模,该ML估计器使用大余量,支持向量机,SVM和学习方法形成,具有最佳决策边界。这些经过区别训练的模型可以最大化观察空间的熵,从而能够对语音过程进行建模而不会造成损失。实验证明了这一点,帧级别分类错误率约为3%。本文设计了一个目标函数,将真实的语音分布与其估计值相关联。结果表明,通过优化此功能,可以对语音处理时间序列进行建模而不会丢失信息。

著录项

  • 作者

    Joshi, Neil.;

  • 作者单位

    Ryerson University (Canada).;

  • 授予单位 Ryerson University (Canada).;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 142 p.
  • 总页数 142
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:38:01

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号