Integration of articulatory knowledge and voicing features based on DNN/HMM for Mandarin speech recognition

机译：基于DNN / HMM的语音知识与语音功能的集成，用于普通话语音识别

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Speech production knowledge has been used to enhance the phonetic representation and the performance of automatic speech recognition (ASR) systems successfully. Representations of speech production make simple explanations for many phenomena observed in speech. These phenomena can not be easily analyzed from either acoustic signal or phonetic transcription alone. One of the most important aspects of speech production knowledge is the use of articulatory knowledge, which describes the smooth and continuous movements in the vocal tract. In this paper, we present a new articulatory model to provide available information for rescoring the speech recognition lattice hypothesis. The articulatory model consists of a feature front-end, which computes a voicing feature based on a spectral harmonics correlation (SHC) function, and a back-end based on the combination of deep neural networks (DNNs) and hidden Markov models (HMMs). The voicing features are incorporated with standard Mel frequency cepstral coefficients (MFCCs) using heteroscedastic linear discriminant analysis (HLDA) to compensate the speech recognition accuracy rates. Moreover, the advantages of two different models are taken into account by the algorithm, which retains deep learning properties of DNNs, while modeling the articulatory context powerfully through HMMs. Mandarin speech recognition experiments show the proposed method achieves significant improvements in speech recognition performance over the system using MFCCs alone.

机译：语音生成知识已被用于成功地增强语音表示和自动语音识别（ASR）系统的性能。语音产生的表示对语音中观察到的许多现象做出了简单的解释。仅凭声音信号或语音转录无法轻易分析这些现象。语音产生知识的最重要方面之一是发音知识的使用，它描述了声道中平滑而连续的运动。在本文中，我们提出了一种新的发音模型，以提供可用于记录语音识别格点假设的信息。衔接模型包括一个特征前端和一个后端，该特征前端基于频谱谐波相关（SHC）函数计算语音特征，而后端则基于深度神经网络（DNN）和隐马尔可夫模型（HMM）组合。语音功能与标准梅尔频率倒谱系数（MFCC）结合在一起，使用异方差线性判别分析（HLDA）来补偿语音识别准确率。此外，该算法考虑了两个不同模型的优势，该算法保留了DNN的深度学习属性，同时通过HMM对关节环境进行了有力的建模。普通话语音识别实验表明，相对于仅使用MFCC的系统，该方法在语音识别性能上有显着提高。

著录项

来源
《International Joint Conference on Neural Networks》|2015年|1-8|共8页
会议地点
作者
Ying-Wei Tan; Wen-Ju Liu; Wei Jiang; Hao Zheng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks [J] . Ju Lin, Wei Li, Yingming Gao, Journal of VLSI signal processing systems . 2018,第7期

机译：通过扩展识别网络将声学和发音特征相结合来改善基于DNN的普通话音调识别
2. Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis [J] . Zhen-Hua Ling, Richmond K., Yamagishi J., Audio, Speech, and Language Processing, IEEE Transactions on . 2009,第6期

机译：将发音特征集成到基于HMM的参数语音合成中
3. Integration of tonal knowledge into phonetic HMMs for recognition of speech in tone languages [J] . Tanee Demeechai, Kimmo Makelainen Signal processing . 2000,第10期

机译：将音调知识集成到语音HMM中以识别语音语言
4. Integration of articulatory knowledge and voicing features based on DNN/HMM for Mandarin speech recognition [C] . Ying-Wei Tan, Wen-Ju Liu, Wei Jiang, International Joint Conference on Neural Networks . 2015

机译：基于DNN / HMM进行普通话语音识别的关节知识和发声特征的整合
5. Modeling articulatory dynamics using HMM techniques for automatic speech recognition. [D] . Erler, Kevin J. 1994

机译：使用HMM技术对发音动力学进行建模以实现自动语音识别。
6. Multi-Talker Speech Promotes Greater Knowledge-Based Spoken Mandarin Word Recognition in First and Second Language Listeners [O] . Seth Wiener, Chao-Yang Lee 2020

机译：多语种语音在第一语言和第二语言听众中促进基于知识的口语普通话单词识别
7. Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis [O] . Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi, 2009

机译：将发音特征整合到基于Hmm的参数语音合成中

Integration of articulatory knowledge and voicing features based on DNN/HMM for Mandarin speech recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅