首页> 外文会议>International Conference on Computer Engineering and Systems >Robust Audio-Visual Speech Recognition System based on Gabor Features and Dynamic Stream Weight Adaption
【24h】

Robust Audio-Visual Speech Recognition System based on Gabor Features and Dynamic Stream Weight Adaption

机译:基于Gabor特征和动态流权重自适应的鲁棒视听语音识别系统

获取原文

摘要

This paper aims to enhance the performance of audio-visual speech recognition (AVSR) systems by introducing contributions in both the front-end and back-end system stages. Identifying a reliable feature is a crucial step towards enhancing the front-end stage of both audio-module and visual-module. A two-dimensional Gabor filter with different scales and directions is utilized to generate a set of noise robust spectro-temporal audio and visual features. The performance achieved from the Gabor audio features (GAFs) and Gabor visual features (GVFs) is compared to the performance of the traditional audio features such as MFCC, PLP, RASTA-PLP and visual features such as DCT2. The experimental results demonstrate that a system utilizes Gabor features in the front-end has a much better performance, especially at low SNR levels. To enhance the back-end stage, a framework based on synchronous multi-stream hidden Markov model is proposed to solve the dynamic stream weight estimation problem. To demonstrate the effect of dynamic weighting on enhancing the AVSR performance, we empirically compare between late integration (LI) and early integration (EI) strategies, especially in a low-SNR scenario. The experimental results show that the AVSR-LI system achieves superior performance for all SNR levels compared to AVSR-EI system.
机译:本文旨在通过介绍前端和后端系统阶段的贡献来增强视听语音识别(AVSR)系统的性能。识别可靠的功能是增强音频模块和视觉模块的前端阶段的关键步骤。利用具有不同比例和方向的二维Gabor滤波器来生成一组具有噪声鲁棒性的光谱时音频和视觉特征。将Gabor音频功能(GAF)和Gabor视觉功能(GVF)实现的性能与MFCC,PLP,RASTA-PLP等传统音频功能以及DCT2等视觉功能的性能进行了比较。实验结果表明,利用前端Gabor功能的系统具有更好的性能,尤其是在低SNR级别时。为了提高后端阶段的性能,提出了一种基于同步多流隐马尔可夫模型的框架来解决动态流权重估计问题。为了证明动态加权对增强AVSR性能的影响,我们在经验上比较了后期集成(LI)和早期集成(EI)策略,尤其是在低SNR情况下。实验结果表明,与AVSR-EI系统相比,AVSR-LI系统在所有SNR级别上均具有出色的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号