首页> 外文会议>Annual Conference of the International Speech Communication Association >Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement
【24h】

Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement

机译:介绍视听语音增强的Turbo-Twin-HMM

获取原文

摘要

Models for automatic speech recognition (ASR) hold detailed information about spectral and spectro-temporal characteristics of clean speech signals. Using these models for speech enhancement is desirable and has been the target of past research efforts. In such model-based speech enhancement systems, a powerful ASR is imperative. To increase the recognition rates especially in low-SNR conditions, we suggest the use of the additional visual modality, which is mostly unaffected by degradations in the acoustic channel. An optimal integration of acoustic and visual information is achievable by joint inference in both modalities within the turbo-decoding framework. Thus combining turbo-decoding with Twin-HMMs for speech enhancement, notable improvements can be achieved, not only in terms of instrumental estimates of speech quality, but also in actual speech intelligibility. This is verified through listening tests, which show that in highly challenging noise conditions, average human recognition accuracy can be improved from 64% without signal processing to 80% when using the presented architecture.
机译:用于自动语音识别(ASR)的模型保存有关干净语音信号的频谱和频谱 - 时间特性的详细信息。利用这些模型的语音增强是可取的,已经过去的研究努力的目标。在这样的基于模型的语音增强系统,强大的ASR势在必行。为了提高识别率特别是在低信噪比条件下,我们建议使用额外的视觉形式,其中大部分是由声通道降级不受影响。的声和视觉信息的最佳整合是由在涡轮解码框架内两种模态关节推论可以实现的。因此,结合Turbo解码与双HMM模型的语音增强,显着的改善可以实现,不仅在语音质量的工具估计,方面,而且在实际的语音清晰度。这是通过听力测试,其示出了使用所提出的体系结构时,在高度挑战性噪声条件下,平均人类识别精度可从64%无信号处理,以80%的改进的验证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号