首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Revisiting Video Saliency: A Large-Scale Benchmark and a New Model
【24h】

Revisiting Video Saliency: A Large-Scale Benchmark and a New Model

机译:重温视频显着性:大规模基准和新模式

获取原文

摘要

In this work, we contribute to video saliency research in two ways. First, we introduce a new benchmark for predicting human eye movements during dynamic scene free-viewing, which is long-time urged in this field. Our dataset, named DHF1K (Dynamic Human Fixation), consists of 1K high-quality, elaborately selected video sequences spanning a large range of scenes, motions, object types and background complexity. Existing video saliency datasets lack variety and generality of common dynamic scenes and fall short in covering challenging situations in unconstrained environments. In contrast, DHF1K makes a significant leap in terms of scalability, diversity and difficulty, and is expected to boost video saliency modeling. Second, we propose a novel video saliency model that augments the CNN-LSTM network architecture with an attention mechanism to enable fast, end-to-end saliency learning. The attention mechanism explicitly encodes static saliency information, thus allowing LSTM to focus on learning more flexible temporal saliency representation across successive frames. Such a design fully leverages existing large-scale static fixation datasets, avoids overfitting, and significantly improves training efficiency and testing performance. We thoroughly examine the performance of our model, with respect to state-of-the-art saliency models, on three large-scale datasets (i.e., DHF1K, Hollywood2, UCF sports). Experimental results over more than 1.2K testing videos containing 400K frames demonstrate that our model outperforms other competitors.
机译:在这项工作中,我们通过两种方式为视频显着性研究做出了贡献。首先,我们引入了一种新的基准,用于预测动态场景自由观看过程中的人眼运动,这是该领域的长期要求。我们的数据集名为DHF1K(动态人体注视),由1K高质量,精心挑选的视频序列组成,这些视频序列涵盖了大范围的场景,动作,物体类型和背景复杂性。现有的视频显着性数据集缺乏常见动态场景的多样性和通用性,并且在覆盖不受限制的环境中的挑战性情况方面也不够。相比之下,DHF1K在可扩展性,多样性和难度方面取得了重大飞跃,并有望增强视频显着性建模。其次,我们提出了一种新颖的视频显着性模型,该模型通过一种关注机制增强了CNN-LSTM网络体系结构,以实现快速的端到端显着性学习。注意机制显式地编码静态显着性信息,从而使LSTM可以专注于学习连续帧之间更灵活的时间显着性表示。这样的设计充分利用了现有的大规模静态固定数据集,避免了过拟合,并显着提高了培训效率和测试性能。我们针对最先进的显着性模型在三个大型数据集(即DHF1K,Hollywood2,UCF体育)上彻底检查了模型的性能。超过120个包含40万帧的测试视频的实验结果表明,我们的模型优于其他竞争对手。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号