Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

机译：时空3D CNN是否可以追溯2D CNN和ImageNet的历史？

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The purpose of this study is to determine whether current video datasets have sufficient data for training very deep convolutional neural networks (CNNs) with spatio-temporal three-dimensional (3D) kernels. Recently, the performance levels of 3D CNNs in the field of action recognition have improved significantly. However, to date, conventional research has only explored relatively shallow 3D architectures. We examine the architectures of various 3D CNNs from relatively shallow to very deep ones on current video datasets. Based on the results of those experiments, the following conclusions could be obtained: (i) ResNet-18 training resulted in significant overfitting for UCF-101, HMDB-51, and ActivityNet but not for Kinetics. (ii) The Kinetics dataset has sufficient data for training of deep 3D CNNs, and enables training of up to 152 ResNets layers, interestingly similar to 2D ResNets on ImageNet. ResNeXt-101 achieved 78.4% average accuracy on the Kinetics test set. (iii) Kinetics pretrained simple 3D architectures outperforms complex 2D architectures, and the pretrained ResNeXt-101 achieved 94.5% and 70.2% on UCF-101 and HMDB-51, respectively. The use of 2D CNNs trained on ImageNet has produced significant progress in various tasks in image. We believe that using deep 3D CNNs together with Kinetics will retrace the successful history of 2D CNNs and ImageNet, and stimulate advances in computer vision for videos. The codes and pretrained models used in this study are publicly available

机译：这项研究的目的是确定当前视频数据集是否具有足够的数据，以训练具有时空三维（3D）内核的非常深的卷积神经网络（CNN）。最近，在动作识别领域中3D CNN的性能水平已显着提高。但是，迄今为止，常规研究仅探索了相对较浅的3D架构。我们研究了各种3D CNN的体系结构，从当前视频数据集的相对浅到非常深的3D CNN。根据这些实验的结果，可以得出以下结论：（i）ResNet-18训练导致UCF-101，HMDB-51和ActivityNet的过大拟合，而动力学不是。（ii）Kinetics数据集具有足够的数据来训练深3D CNN，并能够训练多达152个ResNets层，有趣的是类似于ImageNet上的2D ResNets。 ResNeXt-101在动力学测试仪上达到了78.4％的平均准确度。（iii）动力学预训练的简单3D架构胜过复杂的2D架构，并且预训练的ResNeXt-101在UCF-101和HMDB-51上分别达到94.5％和70.2％。在ImageNet上训练的2D CNN的使用已在图像的各种任务中取得了重大进展。我们相信，将深3D CNN与Kinetics一起使用将追溯2D CNN和ImageNet的成功历史，并刺激视频计算机视觉的发展。本研究中使用的代码和预训练模型可公开获得

著录项

来源
《IEEE/CVF Conference on Computer Vision and Pattern Recognition》|2018年|6546-6555|共10页
会议地点 Salt Lake City(US)
作者
Kensho Hara; Hirokatsu Kataoka; Yutaka Satoh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Three-dimensional displays; Kinetic theory; Two dimensional displays; Training; Task analysis; Kernel; Computer vision;

机译：三维显示器；动力学理论；二维显示；训练;任务分析；核心;计算机视觉;
入库时间 2022-08-26 14:35:28

相似文献

外文文献
中文文献
专利

1. Action Recognition Using Multi-Scale Temporal Shift Module and Temporal Feature Difference Extraction Based on 2D CNN [J] . Kun-Hsuan Wu, Ching-Te Chiu 软件工程与应用（英文） . 2021,第005期
2. A study on CNN image classification of EEG signals represented in 2D and 3D [J] . Jordan J Bird, Diego R Faria, Luis J Manso, Journal of neural engineering . 2021,第2期

机译：2D和3D中所示EEG信号的CNN图像分类研究
3. Integration of 2D iteration and a 3D CNN-based model for multi-type artifact suppression in C-arm cone-beam CT [J] . Dahim Choi, Wonjin Kim, Jiyeon Lee, Machine Vision and Applications . 2021,第6期

机译：C形臂锥形CT中的2D迭代和基于3D CNN的三维CNN基于3D CNN的模型
4. LNCDS: A 2D-3D cascaded CNN approach for lung nodule classification, detection and segmentation [J] . Dutande Prasad, Baid Ujjwal, Talbar Sanjay Biomedical signal processing and control . 2021,第May期

机译：LNCDS：肺结结分类，检测和分割的2D-3D级联CNN方法
5. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [C] . Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2018

机译：Spatiotemporal 3D CNNS追溯2D CNN和Imagenet的历史？
6. Augmented Dual Input CNN (DI-CNN) for the Diagnostic Classification of Lung Nodule Malignancy from CT Scans [D] . Jain, Arshita. 2020

机译：增强双输入CNN（DI-CNN），用于CT扫描的肺结结恶性肿瘤诊断分类
7. Spatial–Spectral Feature Refinement for Hyperspectral Image Classification Based on Attention-Dense 3D-2D-CNN [O] . Jin Zhang, Fengyuan Wei, Fan Feng, 2020

机译：基于注意力致密3D-2D-CNN的高光谱图像分类空间光谱特征精制
8. Spatiotemporal Fusion in 3D CNNs: A Probabilistic View [O] . Yizhou Zhou, Xiaoyan Sun, Chong Luo, 2020

机译：3D CNNS中的时空融合：概率视图

Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

摘要

著录项

相似文献

相关主题

期刊订阅