What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets

机译：是什么使视频成为视频：分析视频中的时间信息了解模型和数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The ability to capture temporal information has been critical to the development of video understanding models. While there have been numerous attempts at modeling motion in videos, an explicit analysis of the effect of temporal information for video understanding is still missing. In this work, we aim to bridge this gap and ask the following question: How important is the motion in the video for recognizing the action? To this end, we propose two novel frameworks: (i) class-agnostic temporal generator and (ii) motion-invariant frame selector to reduce/remove motion for an ablation analysis without introducing other artifacts. This isolates the analysis of motion from other aspects of the video. The proposed frameworks provide a much tighter estimate of the effect of motion (from 25% to 6% on UCF101 and 15% to 5% on Kinetics) compared to baselines in our analysis. Our analysis provides critical insights about existing models like C3D, and how it could be made to achieve comparable results with a sparser set of frames.

机译：捕获时间信息的能力对于视频理解模型的开发至关重要。尽管已经进行了许多尝试对视频中的运动进行建模的尝试，但是仍然缺少对时间信息对视频理解的影响的明确分析。在这项工作中，我们旨在弥合这一差距并提出以下问题：视频中的动作对于识别动作有多重要？为此，我们提出了两个新颖的框架：（i）类无关的时间生成器和（ii）运动不变的帧选择器，以减少/删除消融分析的运动而不会引入其他伪像。这样可以将运动分析与视频的其他方面隔离开。与我们分析中的基准相比，所提出的框架对运动的影响提供了更为严格的估计（UCF101为25％至6％，动力学为15％至5％）。我们的分析提供了有关现有模型（例如C3D）的关键见解，以及如何通过稀疏的框架集来获得可比的结果。

著录项

来源
《IEEE/CVF Conference on Computer Vision and Pattern Recognition》|2018年|7366-7375|共10页
会议地点 Salt Lake City(US)
作者
De-An Huang; Vignesh Ramanathan; Dhruv Mahajan; Lorenzo Torresani; Manohar Paluri; Li Fei-Fei; Juan Carlos Niebles;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Analytical models; Generators; Kinetic theory; Visualization; Upper bound; Testing; Training;

机译：分析模型；发电机；动力学理论；可视化；上限；测试；训练;

相似文献

外文文献
中文文献
专利

1. Contextualized Videos: Combining Videos with Environment Models to Support Situational Understanding [J] . Wang Yi, Krum David M., Coelho Enylton M., IEEE transactions on visualization and computer graphics . 2007,第6期

机译：情境化视频：将视频与环境模型相结合以支持情境理解
2. Selection and validation of emotional videos: Dataset of professional and amateur videos that elicit basic emotions [J] . HongYi Chen, Kai Ling Chin, Chrystalle B.Y. Tan Data in Brief . 2021,第3期

机译：情感视频的选择和验证：专业和业余视频的数据集，从而引出基本情感
3. Video tampering dataset development in temporal domain for video forgery authentication [J] . Hitesh D. Panchal, Hitesh B. Shah Multimedia Tools and Applications . 2020,第33a34期

机译：视频篡改视频伪造验证时域的数据集开发
4. What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets [C] . De-An Huang, Vignesh Ramanathan, Dhruv Mahajan, IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2018

机译：是什么让视频成为视频：分析视频理解模型和数据集中的时间信息
5. Analyzing Human Activities in Videos using Component Based Models. [D] . Khan, Furqan Muhammad. 2013

机译：使用基于组件的模型分析视频中的人类活动。
6. Selection and validation of emotional videos: Dataset of professional and amateur videos that elicit basic emotions [O] . HongYi Chen, Kai Ling Chin, Chrystalle B.Y. Tan 2021

机译：情感视频的选择和验证：专业和业余视频的数据集从而引出基本情感
7. Contextualized Videos: Combining Videos with Environment Models to Support Situational Understanding [O] . Yi Wang, David M. Krum, Enylton M. Coelho, 2007

机译：情境化视频：将视频与环境模型相结合以支持情境理解

What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets

摘要

著录项

相似文献

相关主题

期刊订阅