Temporal Aggregate Representations for Long-Range Video Understanding

机译：远程视频理解的时间汇总表示

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Future prediction, especially in long-range videos, requires reasoning from current and past observations. In this work, we address questions of temporal extent, scaling, and level of semantic abstraction with a flexible multi-granular temporal aggregation framework. We show that it is possible to achieve state of the art in both next action and dense anticipation with simple techniques such as max-pooling and attention. To demonstrate the anticipation capabilities of our model, we conduct experiments on Breakfast, 50Salads, and EPIC-Kitchens datasets, where we achieve state-of-the-art results. With minimal modifications, our model can also be extended for video segmentation and action recognition.

机译：未来的预测，特别是在远程视频中，需要从当前和过去的观察中推理。在这项工作中，我们通过灵活的多粒时间聚合框架解决了时间范围，缩放和语义抽象水平的问题。我们表明，在下一步动作和密集的预期中可以实现最先进的技术，例如最大池和注意力。为了展示我们模型的预期能力，我们对早餐，50salad和史诗厨房数据集进行实验，我们实现最先进的结果。通过最小的修改，我们的模型也可以扩展为视频分割和动作识别。

著录项

来源
《European Conference on Computer Vision》|2020年|154-171|共18页
会议地点
作者
Fadime Sener; Dipika Singhania; Angela Yao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Action anticipation; Temporal aggregation;

机译：行动预期;时间聚合;

相似文献

外文文献
中文文献
专利

1. Fast method of video genre categorization for temporally aggregated broadcast videos [J] . Choros Kazimierz Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2019,第6aPta1期

机译：用于时间汇总广播视频的视频类型分类的快速方法
2. Aggregating the temporal coherent descriptors in videos using multiple learning kernel for action recognition [J] . Saleh Adel, Abdel-Nasser Mohamed, Angel Garcia Miguel, Pattern recognition letters . 2018,第APRa1期

机译：使用多个学习核对视频中的时间相干描述符进行聚合以进行动作识别
3. Weighting of Sports Categories Detected in Temporally Aggregated Sports News Videos [J] . Kazimierz Choros Cybernetics and Systems . 2016,第1a4期

机译：在临时汇总的体育新闻视频中检测到的体育类别的权重
4. What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets [C] . De-An Huang, Vignesh Ramanathan, Dhruv Mahajan, IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2018

机译：是什么使视频成为视频：分析视频中的时间信息了解模型和数据集
5. Combination of Spatial-Temporal Representation for Video Understanding [D] . Zhang, Pengfei. 2020

机译：视频理解的空间时间表的组合
6. Superpixel-Based Temporally Aligned Representation for Video-Based Person Re-Identification [O] . Changxin Gao, Jin Wang, Leyuan Liu, 2019

机译：基于超像素的临时对齐表示用于基于视频的人员重新识别
7. Understanding affective content of music videos through learned representations [O] . Acar, Esra, Hopfgartner, Frank, Albayrak, Sahin 2014

机译：通过学习表达理解音乐视频的情感内容

Temporal Aggregate Representations for Long-Range Video Understanding

摘要

著录项

相似文献

相关主题

期刊订阅