首页> 外文会议> >Rapid Feature Space Speaker Adaptation for Multi-Stream HMM-Based Audio-Visual Speech Recognition

【24h】

Rapid Feature Space Speaker Adaptation for Multi-Stream HMM-Based Audio-Visual Speech Recognition

机译：快速特征空间说话人自适应，用于基于多流HMM的视听语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multi-stream hidden Markov models (HMMs) have recently been very successful in audio-visual speech recognition, where the audio and visual streams are fused at the final decision level. In this paper we investigate fast feature space speaker adaptation using multi-stream HMMs for audio-visual speech recognition. In particular, we focus on studying the performance of feature-space maximum likelihood linear regression (fMLLR), a fast and effective method for estimating feature space transforms. Unlike the common speaker adaptation techniques of MAP or MLLR, fMLLR does not change the audio or visual HMM parameters, but simply applies a single transform to the testing features. We also address the problem of fast and robust on-line fMLLR adaptation using feature space maximum a posterior linear regression (fMAPLR). Adaptation experiments are reported on the IBM infrared headset audio-visual database. On average for a 20-speaker 1 hour independent test set, the multi-stream fMLLR achieves 31% relative gain on the clean audio condition, and 59% relative gain on the noisy audio condition (approximately 7dB) as compared to the baseline multi-stream system.

机译：近来，多流隐马尔可夫模型（HMM）在视听语音识别方面非常成功，其中在最终决策级别融合了视听流。在本文中，我们研究了使用多流HMM进行快速特征空间说话人自适应，以进行视听语音识别。特别是，我们专注于研究特征空间最大似然线性回归（fMLLR）的性能，这是一种快速有效的估计特征空间变换的方法。与MAP或MLLR的普通扬声器自适应技术不同，fMLLR不会更改音频或视觉HMM参数，而只是将单个变换应用于测试功能。我们还将解决使用特征空间最大后线性回归（fMAPLR）的快速而强大的在线fMLLR自适应问题。在IBM红外耳机视听数据库中报告了适应性实验。平均而言，对于20个扬声器的1小时独立测试集，与基线多声道fMLLR相比，在纯净音频条件下，它的相对增益为31％，在嘈杂音频条件下（约7dB）的相对增益为59％。流系统。

著录项

来源
《》|2005年|P.338-341|共4页
会议地点
作者
Jing Huang; Marcheret; E.; Visweswariah; K.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类工业技术;
关键词

相似文献

外文文献
中文文献
专利

1. Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis [J] . John Dines, Hui Liang, Lakshmi Saheer, Computer speech and language . 2013,第2期

机译：个性化语音到语音翻译：基于HMM的语音合成的无监督跨语言说话者自适应
2. Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System [J] . Dhanalakshmi M., Celin T. A. Mariya, Nagarajan T., Circuits, systems, and signal processing . 2018,第2期

机译：基于HMM的语音识别和自适应合成系统的韵律演讲者的语音输入语音输出通信
3. Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm [J] . Yamagishi J., Kobayashi T., Nakano Y., IEEE transactions on audio, speech and language processing . 2009,第1期

机译：基于HMM的语音合成的说话人自适应算法和约束SMAPLR自适应算法的分析
4. Rapid Feature Space Speaker Adaptation for Multi-Stream HMM-Based Audio-Visual Speech Recognition [C] . Huang J., Marcheret E., Visweswariah K. International broadcasting convention;IBC1990 . 1990

机译：快速特征空间说话人自适应，用于基于多流HMM的视听语音识别
5. Audio parsing and rapid speaker adaptation in speech recognition for spoken document retrieval. [D] . Zhou, Bowen. 2003

机译：语音识别中的音频解析和快速的说话人自适应，可用于语音文档检索。
6. Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition [O] . Myungjong Kim, Younggwan Kim, Joohong Yoo, -1

机译：KL-HMM的正则化说话人适应用于音调异常语音识别
7. Fused HMM-adaptation of multi-stream HMMs for audio-visual speech recognition [O] . Dean David B., Lucey Patrick J., Sridharan Sridha, 2007

机译：多流HMM的融合HMM自适应用于视听语音识别

Rapid Feature Space Speaker Adaptation for Multi-Stream HMM-Based Audio-Visual Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅