Target-directed mixture dynamic models for spontaneous speech recognition

Ma J.Z.; Li Deng

首页> 外文期刊>IEEE Transactions on Speech and Audio Proceessing >Target-directed mixture dynamic models for spontaneous speech recognition

【24h】

Target-directed mixture dynamic models for spontaneous speech recognition

机译：用于自发语音识别的目标定向混合动力模型

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, a novel mixture linear dynamic model (MLDM) for speech recognition is developed and evaluated, where several linear dynamic models are combined (mixed) to represent different vocal-tract-resonance (VTR) dynamic behaviors and the mapping relationships between the VTRs and the acoustic observations. Each linear dynamic model is formulated as the state-space equations, where the VTRs target-directed property is incorporated in the state equation and a linear regression function is used for the observation equation that approximates the nonlinear mapping relationship. A version of the generalized EM algorithm is developed for learning the model parameters, where the constraint that the VTR targets change at the segmental level (rather than at the frame level) is imposed in the parameter learning and model scoring algorithms. Speech recognition experiments are carried out to evaluate the new model using the N-best re-scoring paradigm in a Switchboard task. Compared with a baseline recognizer using the triphone HMM acoustic model, the new recognizer demonstrated improved performance under several experimental conditions. The performance was shown to increase with an increased number of the mixture components in the model.

机译：在本文中，开发并评估了一种新型的语音识别混合线性动态模型（MLDM），其中将几种线性动态模型进行组合（混合）以表示不同的声道共振（VTR）动态行为及其之间的映射关系。 VTR和声学观察。每个线性动态模型都被公式化为状态空间方程，其中VTR的目标定向特性被包含在状态方程中，并且线性回归函数用于近似非线性映射关系的观察方程。开发了通用EM算法的一种版本，用于学习模型参数，其中在参数学习和模型评分算法中，将VTR目标在分段级别（而不是帧级别）上更改的约束施加于模型。进行语音识别实验以使用Switchboard任务中的N最佳重评分范式评估新模型。与使用三音机HMM声学模型的基线识别器相比，该新型识别器在几种实验条件下均表现出更高的性能。结果表明，随着模型中混合组分数量的增加，性能会提高。

著录项

来源
《IEEE Transactions on Speech and Audio Proceessing》 |2004年第1期|p.47-58|共12页
作者
Ma J.Z.; Li Deng;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类电声技术和语音信号处理;
关键词
hidden Markov models; parameter estimation; regression analysis; speech processing; speech recognition; state-space methods; N best rescoring paradigm; baseline recognizer; generalized EM algorithm; linear regression function; mixture linear dynamic model; nonlinea;

机译：隐马尔可夫模型;参数估计;回归分析;语音处理;语音识别;状态空间方法;N最佳记录范式;基线识别器;广义EM算法;线性回归函数;混合线性动态模型;非线性;
入库时间 2022-08-18 00:13:04

相似文献

外文文献
中文文献
专利

1. Target-directed mixture dynamic models for spontaneous speech recognition [J] . Ma J.Z., Li Deng IEEE Transactions on Speech and Audio Proceeding . 2004,第1期

机译：用于自发语音识别的目标定向混合动力模型
2. State-dependent phonetic tied mixtures with pronunciation modeling for spontaneous speech recognition [J] . Yi Liu, Fung P. IEEE Transactions on Speech and Audio Proceessing . 2004,第4期

机译：状态相关的语音绑定混合物和发音模型，用于自发语音识别
3. Spontaneous speech recognition using a statistical coarticulatory model for the vocal-tract-resonance dynamics [J] . Li Deng, Jeff Ma The Journal of the Acoustical Society of America . 2000,第6期

机译：使用统计共发音模型的自发语音识别，用于声道共振动态
4. A mixture linear model with target-directed dynamics for spontaneous speech recognition [C] . Ma, J., Li Deng . 2002

机译：具有目标定向动力学的混合线性模型用于自发语音识别
5. Mixtures of inverse covariances: Covariance modeling for Gaussian mixtures with applications to automatic speech recognition. [D] . Vanhoucke, Vincent. 2004

机译：逆协方差的混合：高斯混合的协方差建模及其在自动语音识别中的应用。
6. Detecting Manic State of Bipolar Disorder Based on Support Vector Machine and Gaussian Mixture Model Using Spontaneous Speech [O] . Zhongde Pan, Chao Gui, Jing Zhang, 2018

机译：基于支持向量机和高斯混合模型的自发性语音躁狂状态检测
7. Target-directed mixture dynamic models for spontaneous speech recognition [O] . Jeff Z. Ma, Li Deng, Senior Member 2015

机译：用于自发语音识别的目标导向混合动力学模型

Target-directed mixture dynamic models for spontaneous speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅