首页> 外文期刊>Neurocomputing >PFWNet: Pretraining neural network via feature jigsaw puzzle for weakly-supervised temporal action localization
【24h】

PFWNet: Pretraining neural network via feature jigsaw puzzle for weakly-supervised temporal action localization

机译:PFWNET:通过特征拼图拼图的预先曝光神经网络,用于弱监督的时间作用定位

获取原文
获取原文并翻译 | 示例

摘要

Weakly supervised temporal action localization is a challenging yet interesting task. Existing methods usually apply a few temporal convolutional layers or linear layers to predict classification scores, where the model capacity is limited. Inspired by counterpart researches, increasing model capacity is the potential to improve the localization performance. However, under the weakly supervised paradigm, the video level classification label is insufficient to learn large-capacity models. The essential reason lies in that most of the inputs to action localization networks are high-level features extracted by video recognition models. In lack of off-the-shelf initialization weights, the action localization networks have to train from scratch and can only explore low-capacity models. In this work, we are inspired by the self-supervised learning paradigm and propose to learn high-quality representative models via solving the feature jigsaw puzzle task. The proposed self-supervised pretraining process can explore networks with large kernel size and deeper layers, which can provide valuable initialization to action localization networks. In the implementation, we first discover potential action scopes via calculating motion intensity. Then, we cut features into snippets and permute them into out-of-order status. We randomly discard frames for boundaries between two snippets to guide the network learning high-level representations and prevent information leakage. Moreover, because the potential permutation number factorially rises with the increase of snippet number, we select a fixed number of permutation operations via the maximum hamming distance criterion, which eases the learning process. Comprehensive experiments on two benchmarks demonstrate the efficiency of pretraining to weakly supervised action localization task, and the proposed method builds new state-of-the-art performance.(c) 2021 Elsevier B.V. All rights reserved.
机译:弱势监督的时间行动本地化是一个挑战性的有趣任务。现有方法通常应用几个时间卷积层或线性层以预测模型容量有限的分类评分。灵感来自对手研究,越来越多的模型能力是提高本地化性能的潜力。但是,在弱监督范式下,视频级分类标签不足以学习大容量模型。基本原因在于,大多数用于行动定位网络的输入是由视频识别模型提取的高级功能。在缺乏现成的初始化权重中,行动本地化网络必须从头训练,只能探索低容量模型。在这项工作中,我们受到自我监督的学习范式的启发,并建议通过解决特征拼图拼图任务来学习高质量代表模型。建议的自我监督预测过程可以探索具有大型内核大小和更深层的网络,可以为行动定位网络提供有价值的初始化。在实现中,我们首先通过计算运动强度来发现潜在的动作范围。然后,我们将功能剪切到代码片中并释放它们以无序状态。我们随机丢弃两个片段之间边界的帧,以指导网络学习高级表示并防止信息泄漏。此外,由于潜在的置换编号随片段数量的增加来升高,所以我们通过最大汉明距离标准选择固定数量的置换操作,这减轻了学习过程。两台基准综合实验证明了预测弱势监督的措施定位任务的效率,提出的方法建立了新的最先进的性能。(c)2021 Elsevier B.v.保留所有权利。

著录项

  • 来源
    《Neurocomputing》 |2021年第5期|162-173|共12页
  • 作者单位

    Northwestern Polytech Univ Shenzhen Res & Dev Inst Shenzhen 518057 Peoples R China|Northwestern Polytech Univ Sch Automat Engn Xian 710072 Peoples R China;

    Northwestern Polytech Univ Shenzhen Res & Dev Inst Shenzhen 518057 Peoples R China|Northwestern Polytech Univ Sch Automat Engn Xian 710072 Peoples R China;

    Northwestern Polytech Univ Sch Automat Engn Xian 710072 Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Self-supervised learning; Temporal action localization; Weakly-supervised learning;

    机译:自我监督学习;时间行动本地化;弱监督学习;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号