首页> 外文会议> >Integrating audio and visual information to provide highly robust speech recognition

【24h】

Integrating audio and visual information to provide highly robust speech recognition

机译：集成音频和视频信息以提供高度可靠的语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

There is a requirement in many human machine interactions to provide accurate automatic speech recognition in the presence of high levels of interfering noise. The the paper shows that performance improvements in recognition accuracy can be obtained by including data derived from a speaker's lip images. We describe the combination of the audio and visual data in the construction of composite feature vectors and a hidden Markov model structure which allows for asynchrony between the audio and visual components. These ideas are applied to a speaker dependent recognition task involving a small vocabulary and subject to interfering noise. The recognition results obtained using composite vectors and cross-product models are compared with those based on an audio-only feature vector. The benefit of this approach is shown to be an increased performance over a very wide range of noise levels.

机译：在许多人机交互中需要在存在高水平干扰噪声的情况下提供准确的自动语音识别。该论文表明，通过包括从说话人的嘴唇图像中得出的数据，可以提高识别精度的性能。我们在构造复合特征向量和隐藏的马尔可夫模型结构时描述了音频和视频数据的组合，该结构允许音频和视频组件之间的异步。这些想法被应用到与说话者相关的识别任务上，该任务涉及少量的词汇并且容易受到干扰。将使用复合向量和叉积模型获得的识别结果与基于纯音频特征向量的识别结果进行比较。事实证明，这种方法的好处是可以在很大范围的噪声水平上提高性能。

著录项

来源
《》|1996年|P.821-824|共4页
会议地点
作者
Tomlinson; M.J.; Russell; M.J.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Robust Audio-Visual Speech Recognition Based on Late Integration [J] . Lee J.-S., Park C. H. IEEE transactions on multimedia . 2008,第5期

机译：基于后期集成的鲁棒视听语音识别
2. Lipreading and Audiovisual Speech Recognition Across the Adult Lifespan: Implications for Audiovisual Integration [J] . Tye-Murray Nancy, Spehar Brent, Myerson Joel, Psychology and aging . 2016,第4期

机译：跨成年人寿命的唇读和视听语音识别：对视听整合的启示
3. Effects of aging on audio-visual speech integration Effects of aging on audio-visual speech integration [J] . Huyse Aurelie, Leybaert Jacqueline, Berthommier Frederic The Journal of the Acoustical Society of America . 2014,第4aPta1期

机译：衰老对视听语音整合的影响衰老对视听语音整合的影响
4. Integrating audio and visual information to provide highly robust speech recognition [C] . Tomlinson M.J., Russell M.J., Institute of Electric and Electronic Engineer IEEE International Conference on Acoustics, Speech, and Signal Processing . 1996

机译：集成音频和视觉信息以提供高度强大的语音识别
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Lipreading and Audiovisual Speech Recognition across the Adult Lifespan: Implications for Audiovisual Integration [O] . Nancy Tye-Murray, Brent Spehar, Joel Myerson, -1

机译：成人寿命中的唇读和视听语音识别：对视听整合的启示
7. AUDIO-VISUAL FEATURE INTEGRATION BASED ON PIECEWISE LINEAR TRANSFORMATION FOR NOISE ROBUST AUTOMATIC SPEECH RECOGNITION [O] . Yosuke Kashiwagi, Masayuki Suzuki, Nobuaki Minematsu, 2013

机译：基于分段线性变换的音频—视觉特征集成鲁棒自动语音识别

Integrating audio and visual information to provide highly robust speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅