首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >A Closer Look at Spatiotemporal Convolutions for Action Recognition
【24h】

A Closer Look at Spatiotemporal Convolutions for Action Recognition

机译:仔细看看行动识别的时空卷积

获取原文

摘要

In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition. Our motivation stems from the observation that 2D CNNs applied to individual frames of the video have remained solid performers in action recognition. In this work we empirically demonstrate the accuracy advantages of 3D CNNs over 2D CNNs within the framework of residual learning. Furthermore, we show that factorizing the 3D convolutional filters into separate spatial and temporal components yields significantly gains in accuracy. Our empirical study leads to the design of a new spatiotemporal convolutional block "R(2+1)D" which produces CNNs that achieve results comparable or superior to the state-of-the-art on Sports-1M, Kinetics, UCF101, and HMDB51.
机译:在本文中,我们讨论了用于视频分析的几种形式的时空卷曲,并研究其对行动识别的影响。我们的动机源于观察到应用于视频的单个帧的2D CNNS在动作识别中保持着实心的表现者。在这项工作中,我们经验证明了在剩余学习框架内超过2D CNNS的3D CNNS的准确优势。此外,我们表明将3D卷积滤波器分解成单独的空间和时间部件的精度显着提高。我们的实证研究导致了新的时空卷积块“R(2 + 1)D”的设计,其产生CNN,其达到了可比运动 - 1M,动力学,UCF101和HMDB51。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号