首页> 外文会议> >Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition
【24h】

Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition

机译:产品HMM进行的多模式时间异步建模,用于强大的视听语音识别

获取原文

摘要

The demand for audio-visual speech recognition (AVSR) has increased in order to make speech recognition systems robust to acoustic noise. There are two kinds of research issue in audio-visual speech recognition, such as integration modeling considering asynchronicity between modalities and adaptive information weighting according information reliability. This paper proposes a method to effectively integrate audio and visual information. Such integration, inevitably, necessitates modeling the synchronization and asynchronization of audio and visual information. To address the time lag and correlation problems in individual features between speech and lip movements, we introduce a type of integrated HMM modeling of audio-visual information based on a family of a product HMM. The proposed model can represent state synchronicity not only within a phoneme, but also between phonemes. Furthermore, we also propose a rapid stream weight optimization based on the GPD algorithm for noisy, bimodal speech recognition. Evaluation experiments show that the proposed method improves the recognition accuracy for noisy speech. When SNR=0 dB our proposed method attained 16% higher performance compared to a product HMM without synchronicity re-estimation.
机译:为了使语音识别系统对声学噪声具有鲁棒性,对视听语音识别(AVSR)的需求已经增加。视听语音识别中有两种研究问题,例如考虑模态之间异步性的集成建模和根据信息可靠性进行自适应信息加权的问题。本文提出了一种有效整合视听信息的方法。这种集成不可避免地需要对音频和视频信息的同步和异步建模。为了解决语音和嘴唇运动之间各个特征的时间滞后和相关性问题,我们基于产品HMM系列介绍了一种集成的HMM视听信息建模。所提出的模型不仅可以表示音素内部的状态同步,而且还可以表示音素之间的状态同步性。此外,我们还提出了一种基于GPD算法的快速流权重优化方法,用于嘈杂的双峰语音识别。评估实验表明,该方法提高了噪声语音的识别精度。当SNR = 0 dB时,与不进行同步重新估计的产品HMM相比,我们提出的方法可获得16%的更高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号