首页> 外文期刊>IEEE Transactions on Speech and Audio Proceeding >A novel feature transformation for vocal tract length normalization in automatic speech recognition
【24h】

A novel feature transformation for vocal tract length normalization in automatic speech recognition

机译:自动语音识别中声道长度归一化的新特征转换

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a method to transform acoustic models that have been trained with a certain group of speakers for use on different speech in hidden Markov model based (HMM-based) automatic speech recognition. Features are transformed on the basis of assumptions regarding the difference in vocal tract length between the groups of speakers. First, the vocal tract length (VTL) of these groups has been estimated based on the average third formant F/sub 3/. Second, the linear acoustic theory of speech production has been applied to warp the spectral characteristics of the existing models so as to match the incoming speech. The mapping is composed of subsequent nonlinear submappings. By locally linearizing it and comparing results in the output, a linear approximation for the exact mapping was obtained which is accurate as long as the warping is reasonably small. The feature vector, which is computed from a speech frame, consists of the mel scale cepstral coefficients (MFCC) along with delta and delta/sup 2/-cepstra as well as delta and delta/sup 2/ energy. The method has been tested for TI digits data base, containing adult and children speech, consisting of isolated digits and digit strings of different length. The word error rate when trained on adults and tested on children with transformed adult models is decreased by more than a factor of two compared to the nontransformed case.
机译:本文提出了一种方法,用于在基于隐马尔可夫模型(基于HMM)的自动语音识别中转换已由一组特定的扬声器训练的声学模型,以用于不同的语音。基于关于说话者组之间的声道长度差异的假设来变换特征。首先,已经根据平均第三共振峰F / sub 3 /估计了这些组的声道长度(VTL)。其次,语音产生的线性声学理论已被应用来扭曲现有模型的频谱特征,以匹配传入语音。该映射由后续的非线性子映射组成。通过局部线性化并在输出中比较结果,可以得到精确映射的线性近似值,只要翘曲相当小,它就很精确。从语音帧计算出的特征向量由梅尔尺度倒谱系数(MFCC)以及delta和delta / sup 2 /-倒谱以及delta和delta / sup 2 /能量组成。该方法已经过TI数字数据库的测试,该数据库包含成人和儿童语音,由孤立的数字和不同长度的数字字符串组成。与未转换的案例相比,当在成年人上进行训练并在具有转换后的成人模型的儿童上进行测试时,字错误率降低了两倍以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号