首页> 外文会议> >Rapid Feature Space Speaker Adaptation for Multi-Stream HMM-Based Audio-Visual Speech Recognition
【24h】

Rapid Feature Space Speaker Adaptation for Multi-Stream HMM-Based Audio-Visual Speech Recognition

机译:快速特征空间说话人自适应,用于基于多流HMM的视听语音识别

获取原文

摘要

Multi-stream hidden Markov models (HMMs) have recently been very successful in audio-visual speech recognition, where the audio and visual streams are fused at the final decision level. In this paper we investigate fast feature space speaker adaptation using multi-stream HMMs for audio-visual speech recognition. In particular, we focus on studying the performance of feature-space maximum likelihood linear regression (fMLLR), a fast and effective method for estimating feature space transforms. Unlike the common speaker adaptation techniques of MAP or MLLR, fMLLR does not change the audio or visual HMM parameters, but simply applies a single transform to the testing features. We also address the problem of fast and robust on-line fMLLR adaptation using feature space maximum a posterior linear regression (fMAPLR). Adaptation experiments are reported on the IBM infrared headset audio-visual database. On average for a 20-speaker 1 hour independent test set, the multi-stream fMLLR achieves 31% relative gain on the clean audio condition, and 59% relative gain on the noisy audio condition (approximately 7dB) as compared to the baseline multi-stream system.
机译:近来,多流隐马尔可夫模型(HMM)在视听语音识别方面非常成功,其中在最终决策级别融合了视听流。在本文中,我们研究了使用多流HMM进行快速特征空间说话人自适应,以进行视听语音识别。特别是,我们专注于研究特征空间最大似然线性回归(fMLLR)的性能,这是一种快速有效的估计特征空间变换的方法。与MAP或MLLR的普通扬声器自适应技术不同,fMLLR不会更改音频或视觉HMM参数,而只是将单个变换应用于测试功能。我们还将解决使用特征空间最大后线性回归(fMAPLR)的快速而强大的在线fMLLR自适应问题。在IBM红外耳机视听数据库中报告了适应性实验。平均而言,对于20个扬声器的1小时独立测试集,与基线多声道fMLLR相比,在纯净音频条件下,它的相对增益为31%,在嘈杂音频条件下(约7dB)的相对增益为59%。流系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号