首页> 外文会议>Annual Conference of the International Speech Communication Association >Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement

【24h】

Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement

机译：介绍视听语音增强的Turbo-Twin-HMM

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Models for automatic speech recognition (ASR) hold detailed information about spectral and spectro-temporal characteristics of clean speech signals. Using these models for speech enhancement is desirable and has been the target of past research efforts. In such model-based speech enhancement systems, a powerful ASR is imperative. To increase the recognition rates especially in low-SNR conditions, we suggest the use of the additional visual modality, which is mostly unaffected by degradations in the acoustic channel. An optimal integration of acoustic and visual information is achievable by joint inference in both modalities within the turbo-decoding framework. Thus combining turbo-decoding with Twin-HMMs for speech enhancement, notable improvements can be achieved, not only in terms of instrumental estimates of speech quality, but also in actual speech intelligibility. This is verified through listening tests, which show that in highly challenging noise conditions, average human recognition accuracy can be improved from 64% without signal processing to 80% when using the presented architecture.

机译：用于自动语音识别（ASR）的模型保存有关干净语音信号的频谱和频谱 - 时间特性的详细信息。利用这些模型的语音增强是可取的，已经过去的研究努力的目标。在这样的基于模型的语音增强系统，强大的ASR势在必行。为了提高识别率特别是在低信噪比条件下，我们建议使用额外的视觉形式，其中大部分是由声通道降级不受影响。的声和视觉信息的最佳整合是由在涡轮解码框架内两种模态关节推论可以实现的。因此，结合Turbo解码与双HMM模型的语音增强，显着的改善可以实现，不仅在语音质量的工具估计，方面，而且在实际的语音清晰度。这是通过听力测试，其示出了使用所提出的体系结构时，在高度挑战性噪声条件下，平均人类识别精度可从64％无信号处理，以80％的改进的验证。

著录项

来源
《Annual Conference of the International Speech Communication Association 》|2016年|p1532-2317|共5页
会议地点
作者
Steffen Zeiler; Hendrik Meutzner; Ahmed Hussen Abdelaziz; Dorothea Kolossa;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TB95-53;
关键词

相似文献

外文文献
中文文献
专利

1. Effects of aging on audio-visual speech integration Effects of aging on audio-visual speech integration [J] . Huyse Aurelie, Leybaert Jacqueline, Berthommier Frederic The Journal of the Acoustical Society of America . 2014 ,第4aPta1期

机译：衰老对视听语音整合的影响衰老对视听语音整合的影响
2. Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech [J] . Alm M., Behne D. The Journal of the Acoustical Society of America . 2013 ,第4aPta1期

机译：年龄的视听语音体验会影响语音中视听异步性
3. Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli [J] . David Sodoyer, Jean-Luc Schwartz, Laurent Girin, EURASIP journal on advances in signal processing . 2002 ,第11期

机译：视听语音源分离：利用语音刺激视听连贯的新方法
4. Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement [C] . Steffen Zeiler, Hendrik Meutzner, Ahmed Hussen Abdelaziz, Annual Conference of the International Speech Communication Association . 2016

机译：介绍视听语音增强的Turbo-Twin-HMM
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Audio-Visual Speech Timing Sensitivity Is Enhanced in Cluttered Conditions [O] . Warrick Roseboom, Shinya Nishida, Waka Fujisaki, 2011

机译：视听讲话定时灵敏度提高杂波条件
7. Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization [O] . Sabine Deligne, Gerasimos Potamianos, Chalapathy Neti 2002

机译：使用AVCDCN进行视听语音增强（视听码本依赖的倒谱归一化

Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement

摘要

著录项

相似文献

相关主题

期刊订阅