首页> 外文OA文献 >Investigations on linear transformations for speaker adaptation and normalization

【2h】

Investigations on linear transformations for speaker adaptation and normalization

机译：用于说话人自适应和归一化的线性变换的研究

页面导航

摘要
著录项
相似文献
相关主题

摘要

This thesis deals with linear transformations at various stages of the automatic speech recognition process. In current state-of-the-art speech recognition systems linear transformations are widely used to care for a potential mismatch of the training and testing data and thus enhance the recognition performance. A large number of approaches has been proposed in literature, though the connections between them have been disregarded so far. By developing a unified mathematical framework, close relationships between the particular approaches are identified and analyzed in detail. Mel frequency Cepstral coefficients (MFCC) are commonly used features for automatic speech recognition systems. The traditional way of computing MFCCs suffers from a twofold smoothing, which complicates both the MFCC computation and the system optimization. An improved approach is developed that does not use any filter bank and thus avoids the twofold smoothing. This integrated approach allows a very compact implementation and needs less parameters to be optimized. Starting from this new computation scheme for MFCCs, it is proven analytically that vocal tract normalization (VTN) equals a linear transformation in the Cepstral space for arbitrary invertible warping functions. The transformation matrix for VTN is explicitly calculated exemplary for three commonly used warping functions. Based on some general characteristics of typical VTN warping functions, a common structure of the transformation matrix is derived that is almost independent of the specific functional form of the warping function. By expressing VTN as a linear transformation it is possible, for the first time, to take the Jacobian determinant of the transformation into account for any warping function. The effect of considering the Jacobian determinant on the warping factor estimation is studied in detail. The second part of this thesis deals with a special linear transformation for speaker adaptation, the Maximum Likelihood Linear Regression (MLLR) approach. Based on the close interrelationship between MLLR and VTN proven in the first part, the general structure of the VTN matrix is adopted to restrict the MLLR matrix to a band structure, which significantly improves the MLLR adaptation for the case of limited available adaptation data. Finally, several enhancements to MLLR speaker adaptation are discussed. One deals with refined definitions of regression classes, which is of special importance for fast adaptation when only limited adaptation data are available. Another enhancement makes use of confidence measures to care for recognition errors that decrease the adaptation performance in the first pass of a two-pass adaptation process.

机译：本文研究了自动语音识别过程各个阶段的线性变换。在当前最先进的语音识别系统中，线性变换被广泛用于护理训练和测试数据的潜在失配，从而提高了识别性能。文献中已经提出了大量方法，尽管到目前为止它们之间的联系都被忽略了。通过建立统一的数学框架，可以详细识别和分析特定方法之间的紧密关系。梅尔频率倒谱系数（MFCC）是自动语音识别系统的常用功能。传统的计算MFCC的方法遭受双重平滑，这使MFCC计算和系统优化都变得复杂。开发了一种改进的方法，该方法不使用任何滤波器组，从而避免了双重平滑。这种集成的方法允许非常紧凑的实现，并且需要较少的参数进行优化。从针对MFCC的这种新的计算方案开始，通过分析证明，对于任意可逆翘曲函数，声道归一化（VTN）等于在倒谱空间中的线性变换。针对三个常用的翘曲函数，明确计算出了VTN的转换矩阵。基于典型的VTN翘曲函数的一些一般特征，可以得出变换矩阵的通用结构，该结构几乎与翘曲函数的特定函数形式无关。通过将VTN表示为线性变换，可以首次将变换的雅可比行列式考虑到任何翘曲函数。详细研究了考虑雅可比行列式对翘曲因子估计的影响。本文的第二部分讨论了用于说话人自适应的特殊线性变换，即最大似然线性回归（MLLR）方法。基于在第一部分中证明的MLLR和VTN之间紧密的相互关系，采用VTN矩阵的一般结构将MLLR矩阵限制在一个带结构中，这在可用适应性数据有限的情况下显着提高了MLLR的适应性。最后，讨论了MLLR扬声器自适应的一些增强功能。一种处理回归类的精细定义，当只有有限的适应性数据可用时，这对于快速适应性特别重要。另一个增强功能是使用置信度度量值来护理识别错误，这些错误会在两遍自适应过程的第一遍中降低自适应性能。

著录项

作者
Pitz Michael;
展开▼
作者单位

展开▼
年度 2005
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. Discriminative linear transforms for feature normalization and speaker adaptation in HMM estimation [J] . Tsakalidis S., Doumpiotis V., Byrne W. IEEE Transactions on Speech and Audio Proceessing . 2005,第3期

机译：HMM估计中用于特征归一化和说话人自适应的区分线性变换
2. Fast speaker adaptation using extended diagonal linear transformation for deep neural networks [J] . Donghyun Kim, Sanghun Kim ETRI journal . 2019,第1期

机译：使用扩展对角线性变换的深度神经网络快速说话人自适应
3. A fast maximum likelihood nonlinear feature transformation method for GMM-HMM speaker adaptation [J] . Kaisheng Yao, Dong Yu, Li Deng, Neurocomputing . 2014,第mara27期

机译：GMM-HMM说话人自适应的快速最大似然非线性特征变换方法
4. Speaker normalization and adaptation based on linear transformation [C] . Ishii, J., Tonomura, . 1997

机译：基于线性变换的说话人归一化和自适应
5. Frequency warping by linear transformation, and vocal tract inversion for speaker normalization in automatic speech recognition. [D] . Panchapagesan, Sankaran. 2008

机译：通过线性变换实现的频率扭曲和声道反转，可在自动语音识别中实现说话人归一化。
6. Local Circuits for Contrast Normalization and Adaptation Investigated with Two-Photon Imaging in Cat Primary Visual Cortex [O] . Andreas J. Keller, Kevan A. C. Martin 2015

机译：在猫原发性视觉皮层中用双光子成像研究对比度归一化和适应的局部电路
7. Discriminative linear transforms for feature normalization and speaker adaptation in HMM estimation [O] . Stavros Tsakalidis, Vlasios Doumpiotis, William Byrne 2002

机译：HMM估计中用于特征归一化和说话人自适应的区分线性变换
8. Investigation of Linear Transformations for Automatic Cartographic Analysis. [R] . pickholtz, r. l. movahed, m. 1979

机译：自动制图分析的线性变换研究。

Investigations on linear transformations for speaker adaptation and normalization

摘要

著录项

相似文献

相关主题

期刊订阅