首页> 外文会议>European Conference on Computer Vision >Temporal Aggregate Representations for Long-Range Video Understanding
【24h】

Temporal Aggregate Representations for Long-Range Video Understanding

机译:远程视频理解的时间汇总表示

获取原文

摘要

Future prediction, especially in long-range videos, requires reasoning from current and past observations. In this work, we address questions of temporal extent, scaling, and level of semantic abstraction with a flexible multi-granular temporal aggregation framework. We show that it is possible to achieve state of the art in both next action and dense anticipation with simple techniques such as max-pooling and attention. To demonstrate the anticipation capabilities of our model, we conduct experiments on Breakfast, 50Salads, and EPIC-Kitchens datasets, where we achieve state-of-the-art results. With minimal modifications, our model can also be extended for video segmentation and action recognition.
机译:未来的预测,特别是在远程视频中,需要从当前和过去的观察中推理。在这项工作中,我们通过灵活的多粒时间聚合框架解决了时间范围,缩放和语义抽象水平的问题。我们表明,在下一步动作和密集的预期中可以实现最先进的技术,例如最大池和注意力。为了展示我们模型的预期能力,我们对早餐,50salad和史诗厨房数据集进行实验,我们实现最先进的结果。通过最小的修改,我们的模型也可以扩展为视频分割和动作识别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号