首页> 外文会议> >Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop

【24h】

Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop

机译：大词汇量视听语音识别：Johns Hopkins 2000年夏季研讨会的摘要

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We report a summary of the Johns Hopkins Summer 2000 Workshop on audio-visual automatic speech recognition (ASR) in the large-vocabulary, continuous speech domain. Two problems of audio-visual ASR were mainly addressed: visual feature extraction and audio-visual information fusion. First, image transform and model-based visual features were considered, obtained by means of the discrete cosine transform (DCT) and active appearance models, respectively. The former were demonstrated to yield superior automatic speech reading. Subsequently, a number of feature fusion and decision fusion techniques for combining the DCT visual features with traditional acoustic ones were implemented and compared. Hierarchical discriminant feature fusion and asynchronous decision fusion by means of the multi-stream hidden Markov model consistently improved ASR for both clean and noisy speech. Compared to an equivalent audio-only recognizer, introducing the visual modality reduced ASR word error rate by 7% relative in clean speech, and by 27% relative at an 8.5 dB SNR audio condition.

机译：我们在大词汇表，连续语音域中的视听自动语音识别（ASR）上举报Johns Hopkins夏季2000次研讨会的摘要。主要解决了视听ASR的两个问题：视觉特征提取和视听信息融合。首先，考虑通过离散余弦变换（DCT）和主动外观模型来获得基于图像变换和基于模型的视觉特征。前者被证明是为了产生卓越的自动语音阅读。随后，实现了许多特征融合和决策融合技术，用于将DCT视觉特征与传统声学的功能组合并进行比较。通过多流隐藏的Markov模型的分层判别特征融合和异步决策融合始终改进了ASR，用于清洁和嘈杂的语音。与同等的音频识别器相比，将视觉模态降低的ASR字错误率降低了7％的清洁语音，相对于8.5 dB SNR音频条件相对27％。

著录项

来源
《》|2001年|P.619-624|共6页
会议地点
作者
Neti; C.; Potamianos; G.; Luettin; J.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Reducing latency for language identification based on large-vocabulary continuous speech recognition [J] . Takuma Okamoto, Atsuo Hiroe, Hisashi Kawai Acoustical science and technology . 2017,第1期

机译：减少基于大词汇量连续语音识别的语言识别延迟
2. A segmental framework for fully-unsupervised large-vocabulary speech recognition [J] . Kamper Herman, Jansen Aren, Goldwater Sharon Computer speech and language . 2017,第nova期

机译：完全无监督的大词汇语音识别的分段框架
3. Large-Vocabulary Continuous Speech Recognition Systems: A Look at Some Recent Advances [J] . Saon G., Chien J.-T. Signal Processing Magazine, IEEE . 2012,第6期

机译：大词汇量连续语音识别系统：最近的一些进展
4. Large-vocabulary audio-visual speech recognition: a summary of the johns Hopkins summer 2000 workshop [C] . Chalapathy Neti, Gerasimos Potamianos, Juergen Luettin, IEEE Workshop on Multimedia Signal Processing . 2001

机译：大词汇视听语音识别：约翰霍普金斯夏季2000次研讨会的摘要
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 JOHNS HOPKINS SUMMER WORKSHOP [O] . Mark Hasegawa-Johnson, James Baker, Sarah Borys, -1

机译：基于地标的语音识别：2004年JOHNS HOPKINS夏季研讨会的报告
7. LARGE-VOCABULARY AUDIO-VISUAL SPEECH RECOGNITION: A SUMMARY OF THE JOHNS HOPKINS Summer 2000 Workshop [O] . Chalapathy Neti, Gerasimos Potamianos, Juergen Luettin, 2001

机译：大型语音视听识别：“约翰·霍普金斯” 2000年夏季研讨会摘要
8. Novel Approaches to Arabic Speech Recognition: Report from the 2002 Johns-Hopkins Summer Workshop. [R] . Kirchhoff, K., Bilmes, J., Das, S., 2015

机译：阿拉伯语言识别的新方法：2002年约翰 - 霍普金斯夏季研讨会的报告。

Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop

摘要

著录项

相似文献

相关主题

期刊订阅