首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets
【24h】

What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets

机译:是什么使视频成为视频:分析视频中的时间信息了解模型和数据集

获取原文

摘要

The ability to capture temporal information has been critical to the development of video understanding models. While there have been numerous attempts at modeling motion in videos, an explicit analysis of the effect of temporal information for video understanding is still missing. In this work, we aim to bridge this gap and ask the following question: How important is the motion in the video for recognizing the action? To this end, we propose two novel frameworks: (i) class-agnostic temporal generator and (ii) motion-invariant frame selector to reduce/remove motion for an ablation analysis without introducing other artifacts. This isolates the analysis of motion from other aspects of the video. The proposed frameworks provide a much tighter estimate of the effect of motion (from 25% to 6% on UCF101 and 15% to 5% on Kinetics) compared to baselines in our analysis. Our analysis provides critical insights about existing models like C3D, and how it could be made to achieve comparable results with a sparser set of frames.
机译:捕获时间信息的能力对于视频理解模型的开发至关重要。尽管已经进行了许多尝试对视频中的运动进行建模的尝试,但是仍然缺少对时间信息对视频理解的影响的明确分析。在这项工作中,我们旨在弥合这一差距并提出以下问题:视频中的动作对于识别动作有多重要?为此,我们提出了两个新颖的框架:(i)类无关的时间生成器和(ii)运动不变的帧选择器,以减少/删除消融分析的运动而不会引入其他伪像。这样可以将运动分析与视频的其他方面隔离开。与我们分析中的基准相比,所提出的框架对运动的影响提供了更为严格的估计(UCF101为25%至6%,动力学为15%至5%)。我们的分析提供了有关现有模型(例如C3D)的关键见解,以及如何通过稀疏的框架集来获得可比的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号