【24h】

Attention-Based Audio-Visual Fusion for Video Summarization

机译:基于注意力的视听融合以进行视频汇总

获取原文

摘要

Video summarization compresses videos while preserving the most meaningful content for users. Many image-based works focus on how to effectively utilize video visual cues to choose keyframes. However, apart from visual content, videos also contain useful audio information. In this paper, we propose a novel attention-based audio-visual fusion framework which integrates the audio information with visual information. Our framework is composed of two key components: asymmetrical self-attention mechanism, and odd-even attention. The asymmetrical self-attention mechanism addresses the problem that visual information is more strongly related to video summarization than audio information. The odd-even attention focuses on alleviating the memory requirements. Besides, we create ViAu-SumMe, an audio-visual dataset, which is based on SumMe dataset. Experimental results on the dataset show that our proposed method outperforms the state-of-the-art methods.
机译:视频摘要可在压缩视频的同时为用户保留最有意义的内容。许多基于图像的作品都集中于如何有效利用视频视觉提示来选择关键帧。但是,除了视觉内容之外,视频还包含有用的音频信息。在本文中,我们提出了一种新颖的基于注意力的视听融合框架,该框架将音频信息与视觉信息相集成。我们的框架由两个关键组件组成:不对称的自我注意机制和奇偶注意。非对称自我关注机制解决了以下问题:视觉信息与视频摘要的关系比音频信息更紧密。奇偶注意力集中在减轻内存需求上。此外,我们基于SumMe数据集创建了一个视听数据集ViAu-SumMe。数据集上的实验结果表明,我们提出的方法优于最新方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号