首页> 外文会议>European conference on computer vision >Spatio-temporal Channel Correlation Networks for Action Classification
【24h】

Spatio-temporal Channel Correlation Networks for Action Classification

机译:时空信道相关网络用于动作分类

获取原文

摘要

The work in this paper is driven by the question if spatio-temporal correlations are enough for 3D convolutional neural networks (CNN)? Most of the traditional 3D networks use local spatio-temporal features. We introduce a new block that models correlations between channels of a 3D CNN with respect to temporal and spatial features. This new block can be added as a residual unit to different parts of 3D CNNs. We name our novel block 'Spatio-Temporal Channel Correlation' (STC). By embedding this block to the current state-of-the-art architectures such as ResNext and ResNet, we improve the performance by 2-3% on the Kinetics dataset. Our experiments show that adding STC blocks to current state-of-the-art architectures outperforms the state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets. The other issue in training 3D CNNs is about training them from scratch with a huge labeled dataset to get a reasonable performance. So the knowledge learned in 2D CNNs is completely ignored. Another contribution in this work is a simple and effective technique to transfer knowledge from a pre-trained 2D CNN to a randomly initialized 3D CNN for a stable weight initialization. This allows us to significantly reduce the number of training samples for 3D CNNs. Thus, by fine-tuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e.g. Sports-IM, and fine-tuned on the target datasets, e.g. HMDB51/UCF101.
机译:本文的工作是由时空相关是否足以满足3D卷积神经网络(CNN)的问题驱动的?大多数传统3D网络都使用本地时空功能。我们引入了一个新模块,该模块针对时间和空间特征对3D CNN的通道之间的相关性进行建模。可以将此新块作为残差单元添加到3D CNN的不同部分。我们将新颖的块命名为“时空信道相关性”(STC)。通过将此模块嵌入到诸如ResNext和ResNet的当前最先进的体系结构中,我们在Kinetics数据集上的性能提高了2-3%。我们的实验表明,将STC块添加到当前最新的体系结构中,其性能优于HMDB51,UCF101和Kinetics数据集上的最新方法。训练3D CNN的另一个问题是,使用庞大的标记数据集从头开始训练它们,以获得合理的性能。因此,在2D CNN中学习到的知识将被完全忽略。这项工作的另一个贡献是一种简单有效的技术,可将知识从预先训练的2D CNN传输到随机初始化的3D CNN,以实现稳定的权重初始化。这使我们可以大大减少3D CNN的训练样本数量。因此,通过微调此网络,我们在3D CNN中击败了通用方法和最新方法的性能,这些方法是在大型视频数据集(例如, Sports-IM,并在目标数据集上进行了微调,例如HMDB51 / UCF101。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号