首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >A Closer Look at Spatiotemporal Convolutions for Action Recognition
【24h】

A Closer Look at Spatiotemporal Convolutions for Action Recognition

机译:近距离观察时空卷积的动作识别

获取原文

摘要

In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition. Our motivation stems from the observation that 2D CNNs applied to individual frames of the video have remained solid performers in action recognition. In this work we empirically demonstrate the accuracy advantages of 3D CNNs over 2D CNNs within the framework of residual learning. Furthermore, we show that factorizing the 3D convolutional filters into separate spatial and temporal components yields significantly gains in accuracy. Our empirical study leads to the design of a new spatiotemporal convolutional block 'R(2+1)D' which produces CNNs that achieve results comparable or superior to the state-of-the-art on Sports-1M, Kinetics, UCF101, and HMDB51.
机译:在本文中,我们讨论了视频分析的几种时空卷积形式,并研究了它们对动作识别的影响。我们的动机来自于观察到,应用于视频各个帧的2D CNN在动作识别方面仍然表现出色。在这项工作中,我们通过经验证明了在残差学习框架内3D CNN相对于2D CNN的准确性优势。此外,我们表明将3D卷积滤波器分解为单独的空间和时间分量会显着提高准确性。我们的经验研究导致了新时空卷积模块'R(2 + 1)D'的设计,该模块产生的CNN的结果可与Sports-1M,Kinetics,UCF101和HMDB51。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号