Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation

机译：使用逆CMLLR转换生成的伪说话人特征进行健壮的种子模型训练以适应说话人

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a novel acoustic model training method which is suitable for speaker adaptation in speech recognition. Our method is based on feature generation from a small amount of speakers' data. For decades, speaker adaptation methods have been widely used. Such adaptation methods need some amount of adaptation data and if the data is not sufficient, speech recognition performance degrade significantly. If the seed models to be adapted to a specific speaker can widely cover more speakers, speaker adaptation can perform robustly. To make such robust seed models, we adopt inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our seed models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers is estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudo-speakers' features. Finally, using these features, we train the acoustic seed models. Using this seed models, we obtained better speaker adaptation results than using simply environmentally adapted models.

机译：在本文中，我们提出了一种新的声学模型训练方法，该方法适用于语音识别中的说话人自适应。我们的方法基于少量说话者数据的特征生成。几十年来，说话人适应方法已被广泛使用。这样的自适应方法需要一定数量的自适应数据，如果数据不足，语音识别性能将大大降低。如果要适合特定扬声器的种子模型可以广泛覆盖更多扬声器，则扬声器自适应可以表现出色。为了制作这样的种子模型，我们采用基于最大似然逆线性回归（MLLR）变换的特征生成，然后使用这些特征训练种子模型。首先，我们从数量有限的现有说话人那里获得MLLR变换矩阵。然后，我们使用PCA提取MLLR变换矩阵的基数。估计表示现有扬声器的MLLR变换矩阵的权重参数的分布。接下来，我们通过从分布中采样权重参数来生成伪扬声器MLLR转换，并将转换的逆应用于归一化的现有扬声器特征以生成伪扬声器的特征。最后，使用这些功能，我们训练了声学种子模型。使用这种种子模型，与仅使用环境适应模型相比，我们获得了更好的说话人适应效果。

著录项

来源
《2011 IEEE Workshop on Automatic Speech Recognition amp; Understanding》|2011年|p.169-172|共4页
会议地点 Waikoloa HI(US)
作者
Itoh Arata; Hara Sunao; Kitaoka Norihide; Takeda Kazuya;
展开▼
作者单位

Department of Information Science, Nagoya University, Furo-cho, Chikusa-ku Aichi, 464-8603 Japan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类电声技术和语音信号处理;
关键词

相似文献

外文文献
中文文献
专利

1. Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition [J] . Arata ITOH, Sunao HARA, Norihide KITAOKA, IEICE transactions on information and systems . 2012,第10期

机译：使用由MLLR转换生成的伪扬声器特征进行声学模型训练，以实现与扬声器无关的可靠语音识别
2. Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition [J] . Arata ITOH, Sunao HARA, Norihide KITAOKA, IEICE Transactions on Information and Systems . 2012,第10期

机译：使用由MLLR转换生成的伪扬声器特征进行声学模型训练，以实现与扬声器无关的可靠语音识别
3. Robust speaker adaptation based on parallel factor analysis of training models [J] . Jeong Y. Electronics Letters . 2011,第7期

机译：基于训练模型并行因素分析的健壮说话人适应能力
4. Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation [C] . Itoh Arata, Hara Sunao, Kitaoka Norihide, IEEE Workshop on Automatic Speech Recognition Understanding . 2011

机译：使用逆CMLLR转换生成的伪扬声器功能的扬声器适应的强大种子模型培训
5. Feature and model transformation techniques for robust speaker verification. [D] . Yiu, Kwok Kwong. 2005

机译：功能和模型转换技术可实现可靠的说话人验证。
6. Adapting to Adaptations: Behavioural Strategies that are Robust to Mutations and Other Organisational-Transformations [O] . Matthew D. Egbert, Juan Pérez-Mercader -1

机译：适应适应：对变异和其他组织转变具有鲁棒性的行为策略
7. Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition [O] . Arata Itoh, Sunao Hara, Norihide Kitaoka, 2012

机译：使用由MLLR转换生成的伪扬声器特征进行声学模型训练，以实现与扬声器无关的可靠语音识别

Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation

摘要

著录项

相似文献

相关主题

期刊订阅