首页> 外文期刊>Computer speech and language >Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC
【24h】

Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC

机译:通过标准MFCC的线性变换实现VTLN的频率扭曲和扬声器自适应

获取原文
获取原文并翻译 | 示例
       

摘要

Vocal tract length normalization (VTLN) for standard filterbank-based Mel frequency cepstral coefficient (MFCC) features is usually implemented by warping the center frequencies of the Mel filterbank, and the warping factor is estimated using the maximum likelihood score (MLS) criterion. A linear transform (LT) equivalent for frequency warping (FW) would enable more efficient MLS estimation. We recently proposed a novel LT to perform FW for VTLN and model adaptation with standard MFCC features. In this paper, we present the mathematical derivation of the LT and give a compact formula to calculate it for any FW function. We also show that our LT is closely related to different LTs previously proposed for FW with cepstral features, and these LTs for FW are all shown to be numerically almost identical for the sine-log all-pass transform (SLAPT) warping functions. Our formula for the transformation matrix is, however, computationally simpler and, unlike other previous LT approaches to VTLN with MFCC features, no modification of the standard MFCC feature extraction scheme is required. In VTLN and speaker adaptive modeling (SAM) experiments with the DARPA resource management (RM1) database, the performance of the new LT was comparable to that of regular VTLN implemented by warping the Mel filterbank, when the MLS criterion was used for FW estimation. This demonstrates that the approximations involved do not lead to any performance degradation. Performance comparable to front end VTLN was also obtained with LT adaptation of HMM means in the back end, combined with mean bias and variance adaptation according to the maximum likelihood linear regression (MLLR) framework. The FW methods performed significantly better than standard MLLR for very limited adaptation data (1 utterance), and were equally effective with unsupervised parameter estimation. We also performed speaker adaptive training (SAT) with feature space LT denoted CLTFW. Global CLTFW SAT gave results comparable to SAM and VTLN. By estimating multiple CLTFW transforms using a regression tree, and including an additive bias, we obtained significantly improved results compared to VTLN, with increasing adaptation data.
机译:用于标准基于滤波器组的梅尔频率倒谱系数(MFCC)功能的声道长度归一化(VTLN)通常是通过使Mel滤波器组的中心频率变形来实现的,并且使用最大似然得分(MLS)准则估算变形因子。等效于频率扭曲(FW)的线性变换(LT)将使MLS估算更加有效。我们最近提出了一种新颖的LT,以执行VTLN的固件和具有标准MFCC功能的模型适配。在本文中,我们介绍了LT的数学推导,并给出了用于计算任何FW函数的紧凑公式。我们还表明,我们的LT与先前针对倒频谱的FW提出的不同LT密切相关,并且对于正弦对数全通变换(SLAPT)翘曲函数,这些针对FW的LT在数值上都几乎相同。但是,我们用于变换矩阵的公式在计算上更简单,并且与其他以前的具有MFCC功能的VTLN LT方法不同,不需要修改标准MFCC特征提取方案。在使用DARPA资源管理(RM1)数据库的VTLN和说话人自适应建模(SAM)实验中,当MLS标准用于FW估算时,新LT的性能与通过扭曲Mel滤波器组实现的常规VTLN相当。这表明所涉及的近似值不会导致任何性能下降。通过在后端对HMM手段进行LT适应,并根据最大似然线性回归(MLLR)框架结合平均偏差和方差适应,也可以获得与前端VTLN相当的性能。对于非常有限的适应数据(1种话语),FW方法的性能明显优于标准MLLR,并且在无监督参数估计的情况下同样有效。我们还使用表示为CLTFW的特征空间LT进行了说话人自适应训练(SAT)。全球CLTFW SAT的结果与SAM和VTLN相当。通过使用回归树估计多个CLTFW变换,并包括一个加性偏差,与VTLN相比,我们获得了显着改善的结果,同时增加了适应性数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号