首页> 外文期刊>Neurocomputing >I2Net: Mining intra-video and inter-video attention for temporal action localization
【24h】

I2Net: Mining intra-video and inter-video attention for temporal action localization

机译:I2NET:挖掘视频内和视频间关注时间动作定位

获取原文
获取原文并翻译 | 示例

摘要

This paper focuses on two challenges for temporal action localization community, i.e., lack of long-term relationship and action pattern uncertainty. The former prevents the cooperation among multiple action instances within a video, while the latter may cause incomplete localizations or false positives. The lack of long-term relationship challenge results from the limited receptive field. Instead of stacking multiple layers or using large convolution kernels, we propose the intra-video attention mechanism to bring global receptive field to each temporal point. As for the action pattern uncertainty challenge, although it is hard to precisely depict the desired action pattern, paired videos that share the same action category can provide complementary information about action pattern. Consequently, we propose an inter-video attention mechanism to assist learning accurate action patterns. Based on the intra-video attention and inter-video attention, we propose a unified framework, namely I2Net, to tackle the challenging temporal action localization task. Given two videos containing sharing action categories, I2Net adopts the widely used one-stage action localization paradigm to dispose of them in parallel. As for two neighboring layers within the same video, the intra-video attention brings global information to each temporal point and helps to learn representative features. As for two parallel layers between two videos, the inter-video attention introduces complementary information to each video and helps to learn accurate action patterns. With the cooperation of intra-video and inter-video attention mechanisms, I2Net shows obvious performance gains over the baseline and builds new state-of-the-art on two widely-used benchmarks, i.e., THUMOS14 and ActivityNet v1.3.(c) 2021 Elsevier B.V. All rights reserved.
机译:本文重点关注时间行动本地化社区的两个挑战,即缺乏长期关系和行动模式不确定性。前者可以防止视频中的多个动作实例之间的合作,而后者可能会导致本地化不完整或误报。缺乏有限的接受领域的长期关系挑战。而不是堆叠多个层或使用大型卷积内核,我们提出了视频内部注意机制,将全局接收领域带到每个时间点。至于行动模式不确定性挑战,尽管难以精确地描述所需的行动模式,共享相同动作类别的配对视频可以提供有关操作模式的互补信息。因此,我们提出了一种视频间注意机制,以帮助学习准确的动作模式。根据视频内部关注和视频间关注,我们提出了一个统一的框架,即I2Net,解决具有挑战性的时间行动本地化任务。考虑到包含共享动作类别的两个视频,I2Net采用广泛使用的一阶段动作本地化范例并行地处理它们。至于在同一视频中的两个相邻层,视频内部注意力为每个时间点带来全球信息,并有助于学习代表特征。对于两个视频之间的两个平行层,视频间关注将互补信息引入每个视频,并有助于学习准确的动作模式。随着视频内和视频间关注机制的合作,I2Net在基线上显示了明显的性能,并在两个广泛使用的基准上构建了新的最先进的基准,即Thumos14和ActivityNet V1.3。(C )2021 Elsevier BV保留所有权利。

著录项

  • 来源
    《Neurocomputing》 |2021年第15期|16-29|共14页
  • 作者单位

    Northwestern Polytech Univ Natl Key Lab Sci & Technol UAV Xian 710072 Peoples R China;

    Northwest Polytech Univ Sch Automat Engn Xian 710072 Peoples R China;

    Northwestern Polytech Univ Natl Key Lab Sci & Technol UAV Xian 710072 Peoples R China;

    Northwest Polytech Univ Sch Automat Engn Xian 710072 Peoples R China;

    Northwest Polytech Univ Sch Automat Engn Xian 710072 Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Intra-video attention; Inter-video attention; Temporal action localization;

    机译:视频内部注意力;视频间关注;时间行动本地化;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号