Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC

Sankaran Panchapagesan; Abeer Alwan

首页> 外文期刊>Computer speech and language >Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC

【24h】

Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC

机译：通过标准MFCC的线性变换实现VTLN的频率扭曲和扬声器自适应

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Vocal tract length normalization (VTLN) for standard filterbank-based Mel frequency cepstral coefficient (MFCC) features is usually implemented by warping the center frequencies of the Mel filterbank, and the warping factor is estimated using the maximum likelihood score (MLS) criterion. A linear transform (LT) equivalent for frequency warping (FW) would enable more efficient MLS estimation. We recently proposed a novel LT to perform FW for VTLN and model adaptation with standard MFCC features. In this paper, we present the mathematical derivation of the LT and give a compact formula to calculate it for any FW function. We also show that our LT is closely related to different LTs previously proposed for FW with cepstral features, and these LTs for FW are all shown to be numerically almost identical for the sine-log all-pass transform (SLAPT) warping functions. Our formula for the transformation matrix is, however, computationally simpler and, unlike other previous LT approaches to VTLN with MFCC features, no modification of the standard MFCC feature extraction scheme is required. In VTLN and speaker adaptive modeling (SAM) experiments with the DARPA resource management (RM1) database, the performance of the new LT was comparable to that of regular VTLN implemented by warping the Mel filterbank, when the MLS criterion was used for FW estimation. This demonstrates that the approximations involved do not lead to any performance degradation. Performance comparable to front end VTLN was also obtained with LT adaptation of HMM means in the back end, combined with mean bias and variance adaptation according to the maximum likelihood linear regression (MLLR) framework. The FW methods performed significantly better than standard MLLR for very limited adaptation data (1 utterance), and were equally effective with unsupervised parameter estimation. We also performed speaker adaptive training (SAT) with feature space LT denoted CLTFW. Global CLTFW SAT gave results comparable to SAM and VTLN. By estimating multiple CLTFW transforms using a regression tree, and including an additive bias, we obtained significantly improved results compared to VTLN, with increasing adaptation data.

机译：用于标准基于滤波器组的梅尔频率倒谱系数（MFCC）功能的声道长度归一化（VTLN）通常是通过使Mel滤波器组的中心频率变形来实现的，并且使用最大似然得分（MLS）准则估算变形因子。等效于频率扭曲（FW）的线性变换（LT）将使MLS估算更加有效。我们最近提出了一种新颖的LT，以执行VTLN的固件和具有标准MFCC功能的模型适配。在本文中，我们介绍了LT的数学推导，并给出了用于计算任何FW函数的紧凑公式。我们还表明，我们的LT与先前针对倒频谱的FW提出的不同LT密切相关，并且对于正弦对数全通变换（SLAPT）翘曲函数，这些针对FW的LT在数值上都几乎相同。但是，我们用于变换矩阵的公式在计算上更简单，并且与其他以前的具有MFCC功能的VTLN LT方法不同，不需要修改标准MFCC特征提取方案。在使用DARPA资源管理（RM1）数据库的VTLN和说话人自适应建模（SAM）实验中，当MLS标准用于FW估算时，新LT的性能与通过扭曲Mel滤波器组实现的常规VTLN相当。这表明所涉及的近似值不会导致任何性能下降。通过在后端对HMM手段进行LT适应，并根据最大似然线性回归（MLLR）框架结合平均偏差和方差适应，也可以获得与前端VTLN相当的性能。对于非常有限的适应数据（1种话语），FW方法的性能明显优于标准MLLR，并且在无监督参数估计的情况下同样有效。我们还使用表示为CLTFW的特征空间LT进行了说话人自适应训练（SAT）。全球CLTFW SAT的结果与SAM和VTLN相当。通过使用回归树估计多个CLTFW变换，并包括一个加性偏差，与VTLN相比，我们获得了显着改善的结果，同时增加了适应性数据。

著录项

来源
《Computer speech and language》 |2008年第1期|42-64|共23页
作者
Sankaran Panchapagesan; Abeer Alwan;
展开▼
作者单位

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
automatic speech recognition; speaker normalization; VTLN; frequency warping; linear transformation; speaker adaptation;

机译：自动语音识别;说话人归一化VTLN;频率扭曲线性变换说话人适应;
入库时间 2022-08-18 02:12:09

相似文献

外文文献
中文文献
专利

1. VTLN Using Analytically Determined Linear-Transformation on Conventional MFCC [J] . Sanand D.R., Umesh S. Audio, Speech, and Language Processing, IEEE Transactions on . 2012,第5期

机译：在常规MFCC上使用解析确定的线性变换的VTLN
2. Frequency Warping for Speaker Adaptation in HMM-based Speech Synthesis [J] . Weixun Gao, Qiying Cao Journal of information science and engineering . 2014,第4期

机译：基于HMM的语音合成中的说话人自适应频率弯曲
3. Fast speaker adaptation using extended diagonal linear transformation for deep neural networks [J] . Donghyun Kim, Sanghun Kim ETRI journal . 2019,第1期

机译：使用扩展对角线性变换的深度神经网络快速说话人自适应
4. Revisiting VTLN Using Linear Transformation on Conventional MFCC [C] . D. R. Sanand, R. Schlueter, H. Ney Annual conference of the International Speech Communication Association;INTERSPEECH 2010 . 2011

机译：在常规MFCC上使用线性变换重新访问VTLN
5. Frequency warping by linear transformation, and vocal tract inversion for speaker normalization in automatic speech recognition. [D] . Panchapagesan, Sankaran. 2008

机译：通过线性变换实现的频率扭曲和声道反转，可在自动语音识别中实现说话人归一化。
6. Comparison of piece‐wise linear linear and nonlinear atlas‐to‐patient warping techniques: Analysis of the labeling of subcortical nuclei for functional neurosurgical applications [O] . M. Mallar Chakravarty, Abbas F. Sadikot, Jürgen Germann, 2009

机译：分段线性线性和非线性图集-患者翘曲技术的比较：功能神经外科应用中皮层下核的标记分析
7. Investigations on linear transformations for speaker adaptation and normalization [O] . Pitz Michael 2005

机译：用于说话人自适应和归一化的线性变换的研究

Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC

摘要

著录项

相似文献

相关主题

期刊订阅