首页> 外文会议> >Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop
【24h】

Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop

机译:大词汇量视听语音识别:Johns Hopkins 2000年夏季研讨会的摘要

获取原文

摘要

We report a summary of the Johns Hopkins Summer 2000 Workshop on audio-visual automatic speech recognition (ASR) in the large-vocabulary, continuous speech domain. Two problems of audio-visual ASR were mainly addressed: visual feature extraction and audio-visual information fusion. First, image transform and model-based visual features were considered, obtained by means of the discrete cosine transform (DCT) and active appearance models, respectively. The former were demonstrated to yield superior automatic speech reading. Subsequently, a number of feature fusion and decision fusion techniques for combining the DCT visual features with traditional acoustic ones were implemented and compared. Hierarchical discriminant feature fusion and asynchronous decision fusion by means of the multi-stream hidden Markov model consistently improved ASR for both clean and noisy speech. Compared to an equivalent audio-only recognizer, introducing the visual modality reduced ASR word error rate by 7% relative in clean speech, and by 27% relative at an 8.5 dB SNR audio condition.
机译:我们在大词汇表,连续语音域中的视听自动语音识别(ASR)上举报Johns Hopkins夏季2000次研讨会的摘要。主要解决了视听ASR的两个问题:视觉特征提取和视听信息融合。首先,考虑通过离散余弦变换(DCT)和主动外观模型来获得基于图像变换和基于模型的视觉特征。前者被证明是为了产生卓越的自动语音阅读。随后,实现了许多特征融合和决策融合技术,用于将DCT视觉特征与传统声学的功能组合并进行比较。通过多流隐藏的Markov模型的分层判别特征融合和异步决策融合始终改进了ASR,用于清洁和嘈杂的语音。与同等的音频识别器相比,将视觉模态降低的ASR字错误率降低了7%的清洁语音,相对于8.5 dB SNR音频条件相对27%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号