Robust Audio-Visual Speech Recognition System based on Gabor Features and Dynamic Stream Weight Adaption

机译：基于Gabor特征和动态流权重自适应的鲁棒视听语音识别系统

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper aims to enhance the performance of audio-visual speech recognition (AVSR) systems by introducing contributions in both the front-end and back-end system stages. Identifying a reliable feature is a crucial step towards enhancing the front-end stage of both audio-module and visual-module. A two-dimensional Gabor filter with different scales and directions is utilized to generate a set of noise robust spectro-temporal audio and visual features. The performance achieved from the Gabor audio features (GAFs) and Gabor visual features (GVFs) is compared to the performance of the traditional audio features such as MFCC, PLP, RASTA-PLP and visual features such as DCT2. The experimental results demonstrate that a system utilizes Gabor features in the front-end has a much better performance, especially at low SNR levels. To enhance the back-end stage, a framework based on synchronous multi-stream hidden Markov model is proposed to solve the dynamic stream weight estimation problem. To demonstrate the effect of dynamic weighting on enhancing the AVSR performance, we empirically compare between late integration (LI) and early integration (EI) strategies, especially in a low-SNR scenario. The experimental results show that the AVSR-LI system achieves superior performance for all SNR levels compared to AVSR-EI system.

机译：本文旨在通过介绍前端和后端系统阶段的贡献来增强视听语音识别（AVSR）系统的性能。识别可靠的功能是增强音频模块和视觉模块的前端阶段的关键步骤。利用具有不同比例和方向的二维Gabor滤波器来生成一组具有噪声鲁棒性的光谱时音频和视觉特征。将Gabor音频功能（GAF）和Gabor视觉功能（GVF）实现的性能与MFCC，PLP，RASTA-PLP等传统音频功能以及DCT2等视觉功能的性能进行了比较。实验结果表明，利用前端Gabor功能的系统具有更好的性能，尤其是在低SNR级别时。为了提高后端阶段的性能，提出了一种基于同步多流隐马尔可夫模型的框架来解决动态流权重估计问题。为了证明动态加权对增强AVSR性能的影响，我们在经验上比较了后期集成（LI）和早期集成（EI）策略，尤其是在低SNR情况下。实验结果表明，与AVSR-EI系统相比，AVSR-LI系统在所有SNR级别上均具有出色的性能。

著录项

来源
《International Conference on Computer Engineering and Systems》|2018年|399-403|共5页
会议地点
作者
Ali Saudi; Mahmoud I. Khalil; Hazem Abbas;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Visualization; Hidden Markov models; Signal to noise ratio; Speech recognition; Feature extraction; Adaptation models; Mel frequency cepstral coefficient;

机译：可视化;隐马尔可夫模型;信噪比;语音识别;特征提取;自适应模型;梅尔频率倒谱系数;

相似文献

外文文献
中文文献
专利

1. Improved features and dynamic stream weight adaption for robust Audio-Visual Speech Recognition framework [J] . Saudi Ali S., Khalil Mahmoud I, Abbas Hazem M. Digital Signal Processing . 2019,第期

机译：用于强大的视听语音语音识别框架的改进功能和动态流重量适应
2. Learning Dynamic Stream Weights For Coupled-HMM-Based Audio-Visual Speech Recognition [J] . Abdelaziz Ahmed Hussen, Zeiler Steffen, Kolossa Dorothea Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2015,第5期

机译：学习动态流权重，用于基于耦合HMM的视听语音识别
3. Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition [J] . Martin Heckmann, Fr#233, d#233, EURASIP journal on advances in signal processing . 2002,第11期

机译：视听语音识别中的噪声自适应流加权
4. Robust Audio-Visual Speech Recognition System based on Gabor Features and Dynamic Stream Weight Adaption [C] . Ali Saudi, Mahmoud I. Khalil, Hazem Abbas International Conference on Computer Engineering and Systems . 2018

机译：基于Gabor特征和动态流重量适应的强大视听语音识别系统
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features [O] . Tursunov Anvarjon, Mustaqeem, Soonil Kwon 2020

机译：深网络：使用深频特征的基于轻量级CNN的语音情感识别系统
7. Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition [O] . Louis H. Terry, Derek J. Shiell, Aggelos K. Katsaggelos 2014

机译：用于视听语音识别中的动态流加权的特征空间视频流一致性估计

Robust Audio-Visual Speech Recognition System based on Gabor Features and Dynamic Stream Weight Adaption

摘要

著录项

相似文献

相关主题

期刊订阅