Speaker normalization and adaptation based on linear transformation

机译：基于线性变换的说话人归一化和自适应

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose novel speaker independent (SI) modeling and speaker adaptation based on a linear transformation. An SI model and speaker dependent (SD) models are usually generated using the same preprocessing of acoustic data. This straightforward preprocessing causes a serious problem. Probability distributions of the SI models become broad and the SI models do not give good initial estimates for speaker adaptation. To solve these problems, a normalized SI model is generated by removing speaker characteristics using a shift vector obtained by the maximum likelihood linear regression (MLLR) technique. In addition, we propose a speaker adaptation method that combines the MLLR and maximum a posteriori (MAP) techniques from the normalized SI model. Experiments have been performed on Japanese phoneme recognition test using continuous density mixture Gaussian HMMs. For the baseline recognition test of normalized SI model, a 12.8% reduction of the phoneme recognition error rate compared to the conventional SI model was achieved. Furthermore the proposed adaptation method using the normalized SI model was more effective than the tested conventional method regardless the amount of adaptation data.

机译：我们提出了新颖的说话人独立（SI）建模和基于线性变换的说话人自适应方法。通常使用相同的声学数据预处理来生成SI模型和与说话者相关的（SD）模型。这种直接的预处理会引起严重的问题。 SI模型的概率分布变得很广泛，并且SI模型没有为说话者适应提供良好的初始估计。为了解决这些问题，通过使用通过最大似然线性回归（MLLR）技术获得的移位向量消除说话者特征，从而生成归一化的SI模型。此外，我们提出了一种说话人自适应方法，该方法将MLLR和最大后验（MAP）技术结合到归一化SI模型中。使用连续密度混合高斯HMM对日本音素识别测试进行了实验。对于归一化SI模型的基线识别测试，与常规SI模型相比，音素识别错误率降低了12.8％。此外，无论适应数据量如何，使用归一化SI模型的拟议适应方法都比经过测试的常规方法更有效。

著录项

来源
《》|1997年|P.1055-1058|共4页
会议地点
作者
Ishii; J.; Tonomura; M.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Discriminative linear transforms for feature normalization and speaker adaptation in HMM estimation [J] . Tsakalidis S., Doumpiotis V., Byrne W. IEEE Transactions on Speech and Audio Proceessing . 2005,第3期

机译：HMM估计中用于特征归一化和说话人自适应的区分线性变换
2. Fast speaker adaptation using extended diagonal linear transformation for deep neural networks [J] . Donghyun Kim, Sanghun Kim ETRI journal . 2019,第1期

机译：使用扩展对角线性变换的深度神经网络快速说话人自适应
3. A fast maximum likelihood nonlinear feature transformation method for GMM-HMM speaker adaptation [J] . Kaisheng Yao, Dong Yu, Li Deng, Neurocomputing . 2014,第mara27期

机译：GMM-HMM说话人自适应的快速最大似然非线性特征变换方法
4. Speaker normalization and adaptation based on linear transformation [C] . Ishii J., Tonomura M., Institute of Electric and Electronic Engineer IEEE International Conference on Acoustics, Speech, and Signal Processing . 1997

机译：基于线性变换的扬声器标准化与适应
5. Frequency warping by linear transformation, and vocal tract inversion for speaker normalization in automatic speech recognition. [D] . Panchapagesan, Sankaran. 2008

机译：通过线性变换实现的频率扭曲和声道反转，可在自动语音识别中实现说话人归一化。
6. Learning linear transformations between counting-based and prediction-based word embeddings [O] . Danushka Bollegala, Kohei Hayashi, Ken-ichi Kawarabayashi 2011

机译：学习基于计数和基于预测的词嵌入之间的线性转换
7. Investigations on linear transformations for speaker adaptation and normalization [O] . Pitz Michael 2005

机译：用于说话人自适应和归一化的线性变换的研究

Speaker normalization and adaptation based on linear transformation

摘要

著录项

相似文献

相关主题

期刊订阅