首页> 外文期刊>Neurocomputing >Spatial-temporal pyramid based Convolutional Neural Network for action recognition
【24h】

Spatial-temporal pyramid based Convolutional Neural Network for action recognition

机译:基于时空金字塔的卷积神经网络的动作识别

获取原文
获取原文并翻译 | 示例

摘要

Convolutional Neural Networks (CNNs) usually use top-level appearance features of video frames for action recognition. However, these methods discard the implicit complementary advantages across different-scale appearance representations which are effective for object detection, instance segmentation and person re-identification. In this paper, a new spatial pyramid module is proposed to take full use of inherent multi-scale information of CNNs with nearly cost-free by which a bottom-up architecture with lateral connections is constructed for combining high-, mid-, low-level representations of CNNs into a hierarchical frame-level feature elaborately. Additionally, temporal relations at appropriate timescale are contributed to the identification of an action. To this end, we also propose a new temporal pyramid module in which frame-level features belonged to one video are reused by various timescale pooling approaches to get different time-grained features of snippets efficiently. Followed by snippet-relation reasoning, different timescale temporal relations are derived and accumulated for the comprehensive prediction. Unifying the proposed spatial and temporal pyramid modules, a novel network, Spatial-Temporal Pyramid Network (S-TPNet), is proposed to extract spatial-temporal pyramid features for action recognition in videos. Unlike previous models which boost performance at the cost of computation, S-TPNet can be trained in an end-to-end fashion with great efficiency. Extensive experiments on Kinetics, UCF101, and HMDB51 demonstrate that S-TPNet displays significant performance improvements compared with existing frameworks and obtains competitive performance with the state-of-the-arts.(1) (C) 2019 Elsevier B.V. All rights reserved.
机译:卷积神经网络(CNN)通常使用视频帧的顶级外观功能进行动作识别。但是,这些方法放弃了跨不同规模外观表示形式的隐式互补优势,这些优势对于对象检测,实例分割和人员重新识别非常有效。本文提出了一种新的空间金字塔模块,该模块可充分利用CNN固有的多尺度信息,而且几乎没有成本,通过这种方法,可构建具有横向连接的自底向上结构,以组合高,中,低将CNN的层级表示精心制作成分层的帧级特征。另外,在适当的时间尺度上的时间关系有助于识别动作。为此,我们还提出了一种新的时间金字塔模块,其中属于一个视频的帧级特征可通过各种时间尺度池化方法重用,以有效地获得摘要的不同时间粒度特征。然后,通过摘要关系推理,得出并累积了不同的时标时间关系,以进行综合预测。为了统一提出的时空金字塔模块,提出了一种新颖的网络,时空金字塔网络(S-TPNet),以提取时空金字塔特征以进行视频中的动作识别。与以前的以计算为代价来提高性能的模型不同,可以高效地以端对端的方式训练S-TPNet。在Kinetics,UCF101和HMDB51上进行的广泛实验表明,与现有框架相比,S-TPNet显示出显着的性能改进,并通过最新技术获得了竞争性能。(1)(C)2019 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Neurocomputing》 |2019年第17期|446-455|共10页
  • 作者单位

    Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China|Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China;

    Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China|Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China;

    Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA;

    Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China|Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Feature pyramid; Action recognition; Relation network; Temporal pyramid;

    机译:特征金字塔;动作识别;关系网络;时间金字塔;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号