首页> 外文会议>IEEE Autotestcon Confernece >Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition
【24h】

Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition

机译:特征空间视频流在视听语音识别中动态流加权的一致性估计

获取原文

摘要

Most current audio-visual automatic speech recognition (AVASR) systems use static weights to leverage between audio and visual information during information fusion. State of the art research has led to using audio reliability metrics for dynamically changing the fusion weights in order to successfully improve overall recognition results. So far, however, incorporating visual reliability metrics into these audio reliability metric based systems have not significantly improved performance. We introduce a new approach to this problem by inferring the “consistency” between the audio and visual information and leveraging the existing audio reliability metrics to create a video reliability metric. Our approach is formulated in the extracted feature space and, thus, does not rely on analyzing the actual video signal itself. The framework presented in this work competes with the audio-only reliability metric based systems and shows promise to consistently outperform.
机译:大多数当前的视听自动语音识别(AVASR)系统使用静态权重来利用在信息融合期间的音频和视觉信息之间的利用。最先进的研究导致使用音频可靠性度量来动态地改变融合权重,以便成功提高整体识别结果。然而,到目前为止,将可视可靠性指标纳入这些基于音频可靠性度量的系统,没有显着提高性能。我们通过推断音频和视觉信息之间的“一致性”并利用现有的音频可靠性度量来创建视频可靠性度量来介绍这种问题的新方法。我们的方法在提取的特征空间中配制,因此,不依赖于分析实际的视频信号本身。本工作中提出的框架与基于音频可靠性度量的系统竞争,并显示承诺以始终如一地倾向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号