首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
【24h】

Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

机译:时空3D CNN是否可以追溯2D CNN和ImageNet的历史?

获取原文

摘要

The purpose of this study is to determine whether current video datasets have sufficient data for training very deep convolutional neural networks (CNNs) with spatio-temporal three-dimensional (3D) kernels. Recently, the performance levels of 3D CNNs in the field of action recognition have improved significantly. However, to date, conventional research has only explored relatively shallow 3D architectures. We examine the architectures of various 3D CNNs from relatively shallow to very deep ones on current video datasets. Based on the results of those experiments, the following conclusions could be obtained: (i) ResNet-18 training resulted in significant overfitting for UCF-101, HMDB-51, and ActivityNet but not for Kinetics. (ii) The Kinetics dataset has sufficient data for training of deep 3D CNNs, and enables training of up to 152 ResNets layers, interestingly similar to 2D ResNets on ImageNet. ResNeXt-101 achieved 78.4% average accuracy on the Kinetics test set. (iii) Kinetics pretrained simple 3D architectures outperforms complex 2D architectures, and the pretrained ResNeXt-101 achieved 94.5% and 70.2% on UCF-101 and HMDB-51, respectively. The use of 2D CNNs trained on ImageNet has produced significant progress in various tasks in image. We believe that using deep 3D CNNs together with Kinetics will retrace the successful history of 2D CNNs and ImageNet, and stimulate advances in computer vision for videos. The codes and pretrained models used in this study are publicly available
机译:这项研究的目的是确定当前视频数据集是否具有足够的数据,以训练具有时空三维(3D)内核的非常深的卷积神经网络(CNN)。最近,在动作识别领域中3D CNN的性能水平已显着提高。但是,迄今为止,常规研究仅探索了相对较浅的3D架构。我们研究了各种3D CNN的体系结构,从当前视频数据集的相对浅到非常深的3D CNN。根据这些实验的结果,可以得出以下结论:(i)ResNet-18训练导致UCF-101,HMDB-51和ActivityNet的过大拟合,而动力学不是。 (ii)Kinetics数据集具有足够的数据来训练深3D CNN,并能够训练多达152个ResNets层,有趣的是类似于ImageNet上的2D ResNets。 ResNeXt-101在动力学测试仪上达到了78.4%的平均准确度。 (iii)动力学预训练的简单3D架构胜过复杂的2D架构,并且预训练的ResNeXt-101在UCF-101和HMDB-51上分别达到94.5%和70.2%。在ImageNet上训练的2D CNN的使用已在图像的各种任务中取得了重大进展。我们相信,将深3D CNN与Kinetics一起使用将追溯2D CNN和ImageNet的成功历史,并刺激视频计算机视觉的发展。本研究中使用的代码和预训练模型可公开获得

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号