Appearance-and-Relation Networks for Video Classification

机译：视频分类的外观和关系网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Spatiotemporal feature learning in videos is a fundamental problem in computer vision. This paper presents a new architecture, termed as Appearance-and-Relation Network (ARTNet), to learn video representation in an end-to-end manner. ARTNets are constructed by stacking multiple generic building blocks, called as SMART, whose goal is to simultaneously model appearance and relation from RGB input in a separate and explicit manner. Specifically, SMART blocks decouple the spatiotemporal learning module into an appearance branch for spatial modeling and a relation branch for temporal modeling. The appearance branch is implemented based on the linear combination of pixels or filter responses in each frame, while the relation branch is designed based on the multiplicative interactions between pixels or filter responses across multiple frames. We perform experiments on three action recognition benchmarks: Kinetics, UCF101, and HMDB51, demonstrating that SMART blocks obtain an evident improvement over 3D convolutions for spatiotemporal feature learning. Under the same training setting, ARTNets achieve superior performance on these three datasets to the existing state-of-the-art methods.

机译：视频中的时空特征学习是计算机视觉的基本问题。本文提出了一种称为外观与关系网络（ARTNet）的新架构，以端到端的方式学习视频表示。 ARTNet是通过堆叠称为SMART的多个通用构造块来构造的，其目标是以单独和显式的方式同时对RGB输入的外观和关系进行建模。具体来说，SMART块将时空学习模块解耦为用于空间建模的外观分支和用于时间建模的关系分支。外观分支是基于每个帧中像素或过滤器响应的线性组合而实现的，而关系分支是基于像素或跨多个帧的过滤器响应之间的乘法相互作用来设计的。我们在三个动作识别基准上进行了实验：动力学，UCF101和HMDB51，证明了SMART块相对于时空特征学习的3D卷积具有明显的改进。在相同的培训环境下，ARTNet在这三个数据集上的性能要优于现有的最新方法。

著录项

来源
《IEEE/CVF Conference on Computer Vision and Pattern Recognition》|2018年|1430-1439|共10页
会议地点 Salt Lake City(US)
作者
Limin Wang; Wei Li; Wen Li; Luc Van Gool;
展开▼
作者单位

National Key Laboratory for Novel Software Technology Nanjing University China;

Google;

Computer Vision Laboratory ETH Zurich Switzerland;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Three-dimensional displays; Solid modeling; Computer architecture; Spatiotemporal phenomena; Computational modeling; Mathematical model; Convolutional codes;

机译：三维显示器；实体建模；计算机架构;时空现象；计算建模；数学模型;卷积码;
入库时间 2022-08-26 14:35:28

相似文献

外文文献
中文文献
专利

1. Deep Learning Networks-Based Action Videos Classification and Search [J] . Wang Wenshi, Huang Zhangqin, Tian Rui International Journal of Pattern Recognition and Artificial Intelligence . 2021,第7期

机译：基于深度学习网络的动作视频分类和搜索
2. Surveillance videos classification based on multilayer long short-term memory networks [J] . Hong Zhang, Liang Zhao, Gang Dai Multimedia Tools and Applications . 2020,第17a18期

机译：监视视频分类基于多层长期短期内存网络
3. Vehicle classification for large-scale traffic surveillance videos using Convolutional Neural Networks [J] . Zhuo Li, Jiang Liying, Zhu Ziqi, Machine Vision and Applications . 2017,第7期

机译：使用卷积神经网络对大型交通监控视频进行车辆分类
4. Appearance-and-Relation Networks for Video Classification [C] . Limin Wang, Wei Li, Wen Li, IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2018

机译：用于视频分类的外观和关系网络
5. Deep Neural Networks with Applications in Image and Video Classification and Recovery [D] . Kappeler, Armin. 2016

机译：深度神经网络及其在图像和视频分类与恢复中的应用
6. Can pre-trained convolutional neural networks be directly used as a feature extractor for video-based neonatal sleep and wake classification? [O] . Muhammad Awais, Xi Long, Bin Yin, 2020

机译：可以预先训练的卷积神经网络直接用作基于视频的新生儿睡眠和唤醒分类的特征提取器吗？
7. Appearance-and-Relation Networks for Video Classification [O] . Limin Wang, Wei Li, Wen Li, 2018

机译：用于视频分类的外观和关系网络

Appearance-and-Relation Networks for Video Classification

摘要

著录项

相似文献

相关主题

期刊订阅