FRAME-DEPENDENT MULTI-STREAM RELIABILITY INDICATORS FOR AUDIO-VISUAL SPEECH RECOGNITION

机译：用于视听语音语音识别的帧相关的多流可靠性指示灯

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We investigate the use of local, frame-dependent reliability indicators of the audio and visual modalities, as a means of estimating stream exponents of multi-stream hidden Markov models for audio-visual automatic speech recognition. We consider two such indicators at each modality, defined as functions of the speech-class conditional observation probabilities of appropriate audio-or visual-only classifiers. We subsequently map the four reliability indicators into the stream exponents of a state-synchronous, two-stream hidden Markov model, as a sigmoid function of their linear combination. We propose two algorithms to estimate the sigmoid weights, based on the maximum conditional likelihood and minimum classification error criteria. We demonstrate the superiority of the proposed approach on a connected-digit audio-visual speech recognition task, under varying audio channel noise conditions. Indeed, the use of the estimated, frame-dependent stream exponents results in a significantly smaller word error rate than using global stream exponents. In addition, it outperforms utterance-level exponents, even though the latter utilize a-priori knowledge of the utterance noise level.

机译：我们研究了音频和视觉模型的本地帧相关可靠性指示符的使用，作为估计用于视听自动语音识别的多流隐马尔可夫模型的流指数的方法。我们考虑在每个模态处的两个这样的指示器，定义为适当的音频或视觉分类器的语音类条件观察概率的函数。我们随后将四个可靠性指示器映射到状态同步，双流隐马尔可夫模型的流指数中，作为其线性组合的SIGMOID函数。我们提出了两种算法来估计S形权重，基于最大条件可能性和最小分类误差标准。我们在不同的音频信道噪声条件下展示了在连接数字视听语音语音识别任务上提出了所提出的方法的优越性。实际上，使用估计的帧相关的流指数导致比使用全局流指令的单词误差率明显较小。此外，它占据了话语级别的指数，即使后者利用了对话语噪声水平的先验知识。

著录项

来源
《IEEE International Conference on Acoustics, Speech, and Signal Processing》|2003年||共4页
会议地点
作者
IEEE;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词

相似文献

外文文献
中文文献
专利

1. Adaptive Reliability Measure and Optimum Integration Weight for Decision Fusion Audio-visual Speech Recognition [J] . R. Rajavel, P. S. Sathidevi Journal of signal processing systems for signal, image, and video technology . 2012,第1期

机译：决策融合视听语音识别的自适应可靠性度量和最佳集成权
2. A new GA optimised Reliability Ratio based integration weight estimation scheme for decision fusion Audio-Visual Speech Recognition [J] . R. Rajavel, P. S. Sathidevi International Journal of Signal and Imaging Systems Engineering . 2011,第2期

机译：一种新的基于遗传算法优化的基于可靠性比率的集成权重估计方案，用于决策融合视听语音识别
3. The Effect of Reliability Measure on Integration Weight Estimation in Audio-Visual Speech Recognition [J] . R. RAJAVEL, Dr. P. S. SATHIDEVI International Journal of Engineering Science and Technology . 2010,第8期

机译：可靠性措施对视听语音识别中集成权重估计的影响
4. FRAME-DEPENDENT MULTI-STREAM RELIABILITY INDICATORS FOR AUDIO-VISUAL SPEECH RECOGNITION [C] . Ashutosh Garg, Gemsimos Potamianos, Chalapathy Neti, International Conference on Multimedia and Expo . 2003

机译：用于视听语音语音识别的帧相关的多流可靠性指示灯
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age? [O] . Magnus Alm, Dawn Behne -1

机译：随着年龄的增长视听利益中的性别差异和视听语音感知中的视觉影响是否会出现？
7. Frame-dependent multi-stream reliability indicators for audio-visual speech recognition [O] . Ashutosh Garg, Gerasimos Potamianos, Chalapathy Neti, 2003

机译：基于帧的多流可靠性指标，用于视听语音识别

FRAME-DEPENDENT MULTI-STREAM RELIABILITY INDICATORS FOR AUDIO-VISUAL SPEECH RECOGNITION

摘要

著录项

相似文献

相关主题

期刊订阅