Rapid Speaker Adaptation Based on Combination of KPCA and Latent Variable Model

Ansari Zohreh; Almasganj Farshad; Kabudian Seyed Jahanshah

首页> 外文期刊>Circuits, systems and signal processing >Rapid Speaker Adaptation Based on Combination of KPCA and Latent Variable Model

【24h】

Rapid Speaker Adaptation Based on Combination of KPCA and Latent Variable Model

机译：基于KPCA和潜变模型的组合快速扬声器适应

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Speaker adaptation is implemented in order to shift the speaker-independent model closer to the new speaker speech characteristics to improve the speech recognition performance. The kernel eigenspace-based speaker adaptation methods provide satisfactory performance using only a small amount of adaptation data. In such adaptation methods, kernel principal component analysis (KPCA) is applied to the training speaker space in order to create kernel eigenspace. Then, the adapted acoustic model to the new user is calculated in that space. One limitation of KPCA is its inability to define a precise pre-image of the model adapted in the kernel eigenspace, back to the speaker space. Therefore, a huge amount of computations is required to perform adaptation. The previously developed solutions for calculation of an approximate pre-image of the adapted model do not necessarily lead to the optimal conditions. Therefore, in this paper, we propose an efficient solution for this problem to construct more reliable pre-image of the adapted model in the speaker space. For this purpose, we benefit from the latent variable model to define a probabilistic model for description of the applied mapping between the kernel eigenspace and the speaker space. The experiments were conducted on two speech databases: FARSDAT, a Persian, and TIMIT, an English speech database. Implementing a typical HMM-based automatic speech recognition system, it was verified that the proposed method, utilizing about three seconds of adaptation data, achieves up to 4.4% and 7.6% relative phoneme recognition accuracy rate over the speaker-independent model on FARSDAT and TIMIT, respectively. Moreover, the proposed approach demonstrated superior performance compared to the other kernel eigenspace-based adaptation methods.

机译：实施扬声器适应以使扬声器的独立模型更接近新的扬声器语音特性来提高语音识别性能。基于内核的基于EIGenspace的扬声器适配方法使用少量适应数据提供满意的性能。在这种适应方法中，内核主成分分析（KPCA）应用于训练扬声器空间，以创建内核EIGenspace。然后，在该空间中计算对新用户的适应声模型。 KPCA的一个限制是它无法定义在内核eIGenspace中适应的模型的精确预图像，然后回到扬声器空间。因此，需要大量的计算来执行适应。以前开发的用于计算适应模型的近似图像的解决方案不一定导致最佳条件。因此，在本文中，我们提出了一种有效的解决方案，用于该问题以构建扬声器空间中适应模型的更可靠的图像。为此目的，我们受益于潜在变量模型，以定义概率模型，用于描述内核EIGenspace和扬声器空间之间的应用映射。实验是在两个语音数据库中进行：Farsdat，Persian和Timit，英文语音数据库。实现基于典型的基于HMM的自动语音识别系统，验证了利用大约三秒的适应数据，在Farsdat和Timit上的扬声器 - 独立模型上实现了高达4.4％和7.6％的相对音素识别精度率高达了4.4％和7.6％的方法，分别。此外，与其他基于内核的基于核心空间的适应方法相比，所提出的方法表现出优越的性能。

著录项

来源
《Circuits, systems and signal processing》 |2021年第8期|3996-4017|共22页
作者
Ansari Zohreh; Almasganj Farshad; Kabudian Seyed Jahanshah;
展开▼
作者单位

Meybod Univ Engn Fac Yahyazadeh Blv Khorramshahr Blv POB 8961699557 Meybod Yazd Iran;

Amirkabir Univ Technol Biomed Engn Dept Speech Proc Lab Tehran Iran;

Razi Univ Engn Fac Kermanshah Iran;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Rapid speaker adaptation; Eigenvoice speaker adaptation; Kernel eigenvoice speaker adaptation; Latent variable model; Speech recognition;

机译：快速扬声器适配;特征性扬声器适应;内核特征语言扬声器适应;潜在变量模型;语音识别;

相似文献

外文文献
中文文献
专利

1. Implementing KPCA-based speaker adaptation methods with different optimization algorithms in a Persian ASR system [J] . Zohreh Ansari, Farshad Almasganj Procedia - Social and Behavioral Sciences . 2012,第2期

机译：在波斯ASR系统中使用不同的优化算法实现基于KPCA的说话人自适应方法
2. Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models [J] . Randy GOMEZ, Akinobu LEE, Tomoki TODA, IEICE Transactions on Information and Systems . 2006,第3期

机译：使用多模板模型在嘈杂环境中提高基于HMM足够统计量的快速无监督说话人适应
3. Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation [J] . Bowen Zhou, Hansen J.H.L. IEEE Transactions on Speech and Audio Proceessing . 2005,第4期

机译：基于特征空间映射的快速判别声学模型，用于说话人快速适应
4. Speaker Adaptation Based on System Combination Using Speaker-Class Models [C] . Tetsuo Kosaka, Takashi Ito, Masaharu Koto, Annual conference of the International Speech Communication Association;INTERSPEECH 2010 . 2011

机译：基于说话者分类模型的系统组合说话人自适应
5. Robust speaker recognition based on latent variable models. [D] . Garcia-Romero, Daniel. 2012

机译：基于潜在变量模型的可靠说话人识别。
6. What do Australian adults eat for breakfast? A latent variable mixture modelling approach for understanding combinations of foods at eating occasions [O] . Rebecca M. Leech, Carol J. Boushey, Sarah A. McNaughton 2021

机译：澳大利亚成年人早餐吃什么？一种潜在的混合混合建模方法用于了解食物时事的组合
7. Implementing KPCA-based speaker adaptation methods with different optimization algorithms in a Persian ASR system [O] . Ansari Zohreh, Almasganj Farshad 2012

机译：在波斯ASR系统中使用不同的优化算法实现基于KPCA的说话人自适应方法

Rapid Speaker Adaptation Based on Combination of KPCA and Latent Variable Model

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅