首页> 外文学位 >Frequency warping by linear transformation, and vocal tract inversion for speaker normalization in automatic speech recognition.

【24h】

Frequency warping by linear transformation, and vocal tract inversion for speaker normalization in automatic speech recognition.

机译：通过线性变换实现的频率扭曲和声道反转，可在自动语音识别中实现说话人归一化。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Vocal Tract Length Normalization (VTLN) for standard filterbank-based Mel Frequency Cepstral Coefficient (MFCC) features is usually implemented by warping the center frequencies of the Mel filterbank, and the warping factor is estimated using the maximum likelihood score (MLS) criterion. A linear transform (LT) equivalent for frequency warping (FW) would enable more efficient MLS estimation. In this dissertation, we present a novel LT to perform FW for VTLN and model adaptation with standard MFCC features. Our formula for the transformation matrix is computationally simpler than previous LT approaches, with no required modification of the standard MFCC feature extraction scheme. In VTLN and Speaker Adaptive Modeling (SAM) experiments with the Resource Management (RMI) database, the performance of the new LT was comparable to that of regular VTLN by warping the Mel filterbank. This demonstrates that the approximations involved in the LT do not lead to any performance degradation. We also performed Speaker Adaptive Training (SAT) with feature space LT denoted CLTFW. Global CLTFW SAT gave results comparable to SAM and VTLN. By estimating multiple CLTFW transforms using a regression tree, and including an additive bias, we obtained significantly improved results compared to VTLN, with increasing adaptation data.;In the second part of the dissertation, vocal tract (VT) inversion to recover the VT shape sequence from speech signals is performed for vowels by cepstral analysis-by-synthesis, using chain-matrix calculation of VT acoustics and the Maeda articulatory model. The derivative of the VT chain matrix with respect to the area function was calculated in a novel efficient manner, and used in the BFGS quasi-Newton method for optimizing a cost function that includes a distance measure between input and synthesized cepstral sequences, and regularization and continuity terms. Inversion is evaluated on data from the University of Wisconsin X-ray microbeam (XRMB) database, and good agreement was achieved between inverted midsagittal VT outlines and measured XRMB tongue and lip pellet positions, with smooth optimized articulatory trajectories, and an average relative error of less than 3% in the first three formants.

机译：用于标准基于滤波器组的梅尔频率倒谱系数（MFCC）功能的人声道长度归一化（VTLN）通常是通过使Mel滤波器组的中心频率变形来实现的，并且使用最大似然得分（MLS）准则估算变形因子。等效于频率扭曲（FW）的线性变换（LT）将使MLS估算更加有效。在本文中，我们提出了一种新颖的LT来执行VTLN的固件和具有标准MFCC特征的模型适配。我们的变换矩阵公式在计算上比以前的LT方法更简单，无需修改标准MFCC特征提取方案。在资源管理（RMI）数据库的VTLN和说话人自适应建模（SAM）实验中，通过扭曲梅尔滤波器组，新LT的性能与常规VTLN相当。这表明LT中涉及的近似值不会导致任何性能下降。我们还执行了特征空间LT表示为CLTFW的说话者自适应训练（SAT）。全球CLTFW SAT的结果与SAM和VTLN相当。通过使用回归树估计多个CLTFW变换，并包括加性偏差，与VTLN相比，我们获得了明显改善的结果，并且适应性数据有所增加。;在论文的第二部分中，声道（VT）反转以恢复VT形状使用VT声学的链矩阵计算和前田咬合模型，通过倒频谱合成对语音进行元音语音序列。 VT链矩阵相对于面积函数的导数以一种新颖有效的方式进行计算，并用于BFGS拟牛顿法中，以优化成本函数，该函数包括输入和合成倒谱序列之间的距离度量以及正则化和连续性条款。利用威斯康星大学X射线微束（XRMB）数据库中的数据评估了反转，并且在矢状中VT反转轮廓与测量的XRMB舌唇边缘位置之间取得了良好的一致性，并具有流畅的优化关节运动轨迹，并且平均相对误差为前三个共振峰中小于3％。

著录项

作者
Panchapagesan, Sankaran.;
展开▼
作者单位

University of California, Los Angeles.;

展开▼
授予单位 University of California, Los Angeles.;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2008
页码 108 p.
总页数 108
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词
入库时间 2022-08-17 11:38:42

相似文献

外文文献
中文文献
专利

1. A novel feature transformation for vocal tract length normalization in automatic speech recognition [J] . Claes T., Dologlou I. IEEE Transactions on Speech and Audio Proceeding . 1998,第6期

机译：自动语音识别中声道长度归一化的新特征转换
2. Normalizing the vocal tract length for speaker independent speech recognition [J] . Qiguang Lin, Chiwei Che IEEE signal processing letters . 1995,第11期

机译：标准化声道长度以实现说话者独立的语音识别
3. Combining Vocal Tract Length Normalization With Hierarchical Linear Transformations [J] . Selected Topics in Signal Processing, IEEE Journal of . 2014,第2期

机译：将人声道长度归一化与分层线性变换相结合
4. Vocal tract length normalization for speaker independent acoustic-to-articulatory speech inversion [C] . Ganesh Sivaraman, Vikramjit Mitra, Hosung Nam, Annual Conference of the International Speech Communication Association . 2016

机译：扬声器独立声学对关节语音反演的声带长度标准化
5. Acoustic-feature-based frequency warping for speaker normalization. [D] . Gouvea, Evandro Bacci. 1999

机译：基于声音特征的频率扭曲，用于扬声器归一化。
6. ACT: An Automatic Centroid Tracking tool for analyzing vocal tract actions in real-time magnetic resonance imaging speech production data [O] . Miran Oh, Yoonjeong Lee -1

机译：ACT：一种自动质心跟踪工具用于分析实时磁共振成像语音产生数据中的声道动作
7. A study on speaker normalization using vocal tract normalization and speaker adaptive training [O] . Welling Lutz, Haeb-Umbach R., Aubert Xavier L., 1998

机译：利用声道归一化和说话人自适应训练对说话人归一化的研究

Frequency warping by linear transformation, and vocal tract inversion for speaker normalization in automatic speech recognition.

摘要

著录项

相似文献

相关主题

期刊订阅