首页> 外文会议>International Joint Conference on Neural Networks >Integration of articulatory knowledge and voicing features based on DNN/HMM for Mandarin speech recognition
【24h】

Integration of articulatory knowledge and voicing features based on DNN/HMM for Mandarin speech recognition

机译:基于DNN / HMM的语音知识与语音功能的集成,用于普通话语音识别

获取原文

摘要

Speech production knowledge has been used to enhance the phonetic representation and the performance of automatic speech recognition (ASR) systems successfully. Representations of speech production make simple explanations for many phenomena observed in speech. These phenomena can not be easily analyzed from either acoustic signal or phonetic transcription alone. One of the most important aspects of speech production knowledge is the use of articulatory knowledge, which describes the smooth and continuous movements in the vocal tract. In this paper, we present a new articulatory model to provide available information for rescoring the speech recognition lattice hypothesis. The articulatory model consists of a feature front-end, which computes a voicing feature based on a spectral harmonics correlation (SHC) function, and a back-end based on the combination of deep neural networks (DNNs) and hidden Markov models (HMMs). The voicing features are incorporated with standard Mel frequency cepstral coefficients (MFCCs) using heteroscedastic linear discriminant analysis (HLDA) to compensate the speech recognition accuracy rates. Moreover, the advantages of two different models are taken into account by the algorithm, which retains deep learning properties of DNNs, while modeling the articulatory context powerfully through HMMs. Mandarin speech recognition experiments show the proposed method achieves significant improvements in speech recognition performance over the system using MFCCs alone.
机译:语音生成知识已被用于成功地增强语音表示和自动语音识别(ASR)系统的性能。语音产生的表示对语音中观察到的许多现象做出了简单的解释。仅凭声音信号或语音转录无法轻易分析这些现象。语音产生知识的最重要方面之一是发音知识的使用,它描述了声道中平滑而连续的运动。在本文中,我们提出了一种新的发音模型,以提供可用于记录语音识别格点假设的信息。衔接模型包括一个特征前端和一个后端,该特征前端基于频谱谐波相关(SHC)函数计算语音特征,而后端则基于深度神经网络(DNN)和隐马尔可夫模型(HMM)组合。语音功能与标准梅尔频率倒谱系数(MFCC)结合在一起,使用异方差线性判别分析(HLDA)来补偿语音识别准确率。此外,该算法考虑了两个不同模型的优势,该算法保留了DNN的深度学习属性,同时通过HMM对关节环境进行了有力的建模。普通话语音识别实验表明,相对于仅使用MFCC的系统,该方法在语音识别性能上有显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号