首页> 外文期刊>Neurocomputing >Two-stage transfer network for weakly supervised action localization
【24h】

Two-stage transfer network for weakly supervised action localization

机译:用于弱监督行动本地化的两阶段转移网络

获取原文
获取原文并翻译 | 示例
       

摘要

Action localization is a central yet challenging task for video analysis. Most existing methods rely heavily on the supervised learning where the action label for each frame should be given beforehand. Unfortunately, for many real applications, it is often costly and source-consuming to obtain frame-level action labels for untrimmed videos. In this paper, a novel two-stage paradigm where only the video-level action labels are required, is proposed for weakly supervised action localization. To this end, an Image-to-Video (I2V) network is firstly developed to transfer the knowledge learned from the image domain (e.g. ImageNet) to the specific video domain. Relying on the model learned from I2V network, a Video-to-Proposal (V2P) network is further designed to identify action proposals without the need of temporal annotations. Lastly, a proposal selection layer is devised on the top of the V2P network to choose the maximal proposal response along each class subject and thus obtain a video-level prediction score. By minimizing the difference between the prediction score and video-level label, we fine-tune our V2P network to learn enhanced discriminative ability on classifying proposal inputs. Extensive experimental results show that our method outperforms the state-of-the-art approaches on ActivityNet1.2 and the mAP is improved from 13.7% to 16.2% on THUMOS14. More importantly, even with weak supervision, our networks attain comparable accuracy to those employing strong supervision, thus demonstrating the effectiveness of our method. (C) 2019 Elsevier B.V. All rights reserved.
机译:行动本地化是视频分析的中央尚待具有挑战性的任务。大多数现有方法严重依赖于监督学习,其中应事先给予每个帧的动作标签。遗憾的是,对于许多真正的应用程序来说,往往是昂贵且源源的耗材,从而获得未经过微的视频的帧级操作标签。在本文中,提出了一种仅需要视频级动作标签的新型两阶段范式,以弱监督动作定位。为此,首先开发了一种图像到视频(I2V)网络以将从图像域(例如想象成)到特定视频域的知识传送到特定视频域。依靠从I2V网络中学到的模型,视频到提议(V2P)网络进一步设计用于识别行动提案,而无需时间注释。最后,在V2P网络的顶部设计了一个提议选择层,以沿着每个类主体选择最大提案响应,从而获得视频级预测分数。通过最大限度地减少预测得分和视频级标签之间的差异,我们微调我们的V2P网络以学习对分类提案投入的增强的判别能力。广泛的实验结果表明,我们的方法优于ActivityNET1.2的最先进的方法,并且地图在Thumos14上从13.7%提高到16.2%。更重要的是,即使是监督薄弱,我们的网络也可以对采用强烈监督的网络达到可比的准确性,从而展示了我们方法的有效性。 (c)2019 Elsevier B.v.保留所有权利。

著录项

  • 来源
    《Neurocomputing》 |2019年第28期|202-209|共8页
  • 作者

    Su Qiubin;

  • 作者单位

    South China Univ Technol Sch Comp Sci & Engn Guangzhou Guangdong Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Weakly supervised learning; Action localization; Untrimmed videos;

    机译:弱监督学习;行动本地化;未经监测的视频;
  • 入库时间 2022-08-18 22:26:39

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号