首页> 外文期刊>Quality Control, Transactions >Weakly Supervised Temporal Action Localization by Multi-Stage Fusion Network
【24h】

Weakly Supervised Temporal Action Localization by Multi-Stage Fusion Network

机译:多级融合网络弱监督的时间作用定位

获取原文
获取原文并翻译 | 示例
       

摘要

Most temporal action localization methods are usually trained using video data-sets with frame-wise annotations which are expensive and time-consuming to acquire. To alleviate this problem, many weakly supervised temporal action localization methods which only leverage video-level annotations during training are proposed. In this paper, we first analyze three problems of weakly supervised temporal action localization, namely feature similarity, action completeness, and weak annotation. Based on these three problems, we propose a novel network called multi-stage fusion network, which decomposes the problems into three different modules within the network, namely feature, sub-action, and action modules. Specifically, for feature similarity, a Triplet Loss was introduced to ensure the action instances from the same class having similar feature sequences and expand the margin of the action instance from different classes in the feature module. For action completeness, each stage of the sub-action module can discover the different sub-actions. The complete action instances can be localized in the action module by fusing multiple sub-actions from the sub-action module. To alleviate weak annotation, we localize multiple action proposals from multi-stage outputs of the network in the action module and select the action proposals with higher confidence scores as predicted action instances. Extensive experiment results on data-sets Thumos'14 and ActivityNet1.2 demonstrate that our method outperforms the state-of-the-art methods and the average mean Average Precision (mAP) on Thumos'14 is significantly improved from 40.9% to 43.3%.
机译:大多数时间行动定位方法通常使用视频数据集进行培训,其中帧展注释是昂贵且耗时的获取。为了减轻这个问题,提出了许多弱监督的时间行动定位方法,只会在培训期间利用视频级注释。在本文中,我们首先分析了弱监督时间行动定位的三个问题,即具有相似性,行动完整性和弱注释。基于这三个问题,我们提出了一种名为多级融合网络的新型网络,该网络将问题分解为网络内的三个不同模块,即功能,子行动和动作模块。具体地,对于特征相似性,引入了三态丢失以确保来自具有类似特征序列的相同类的动作实例,并从特征模块中的不同类扩展动作实例的边距。对于动作完整性,子行动模块的每个阶段都可以发现不同的子操作。通过融合来自子行动模块的多个子操作,可以在操作模块中本地化完整操作实例。为了减轻弱注释,我们将多个动作提案从动作模块中的多级输出本地化,并选择具有更高置信度分数的动作提案,作为预测的动作实例。数据集Thumos'14和ActivityNET1.2的广泛实验结果表明,我们的方法优于最先进的方法,并且Thumos'14上的平均平均平均精度(MAP)显着提高到43.3%至43.3%。 。

著录项

  • 来源
    《Quality Control, Transactions》 |2020年第2020期|17287-17298|共12页
  • 作者单位

    Chinese Acad Sci Hefei Inst Phys Sci Inst Plasma Phys Hefei 230031 Peoples R China|Univ Sci & Technol China Grad Sch Sci Isl Branch Hefei 230026 Peoples R China;

    Chinese Acad Sci Hefei Inst Phys Sci Inst Plasma Phys Hefei 230031 Peoples R China;

    Chinese Acad Sci Hefei Inst Phys Sci Inst Plasma Phys Hefei 230031 Peoples R China|Univ Sci & Technol China Grad Sch Sci Isl Branch Hefei 230026 Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Temporal action localization; temporal action detection; weakly supervised;

    机译:时间作用定位;时间动作检测;弱监督;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号