Representing Motion Features Using Residual Frames with 3D ConvNets for Action Recognition

机译：使用带有3D ConvNets的残差帧表示动作特征的动作特征

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, 3D convolutional networks (3D ConvNets) yield good performance in action recognition. However, optical flow stream is still needed to ensure better performance, the cost of which is very high. In this paper, we propose a fast but effective way to extract motion features from videos utilizing residual frames as the input data in 3D ConvNets. By replacing traditional stacked RGB frames with residual ones, 14.5% and 12.5% points improvements over top-1 accuracy can be achieved on the UCF101 and HMDB51 datasets when trained from scratch. Because residual frames contain little information of object appearance, we further use a 2D convolutional network to extract appearance features and combine them with the results from residual frames to form a two-path solution. In three benchmark datasets, our two-path solution achieved better or comparable performances than those using additional optical flow methods, especially outperformed the state-of-the-art models on Mini-kinetics dataset. Further analysis indicates that better motion features can be extracted using residual frames with 3D ConvNets, and our residual-frame-input path is a good supplement for existing RGB-frame-input models.

机译：最近，3D卷积网络（3D ConvNets）在动作识别方面表现出良好的性能。然而，仍然需要光流来确保更好的性能，其成本非常高。在本文中，我们提出了一种快速而有效的方法，该方法利用残差帧作为3D ConvNets中的输入数据从视频中提取运动特征。通过用残差帧替换传统的堆叠RGB帧，从头开始训练时，UCF101和HMDB51数据集的精度可比top-1精度提高14.5％和12.5％。由于残差帧包含的对象外观信息很少，因此我们进一步使用2D卷积网络来提取外观特征，并将其与残差帧的结果结合起来以形成两路径解决方案。在三个基准数据集中，我们的双路径解决方案比使用其他光流方法的性能更好或更可比，特别是在Mini-Kinetics数据集上的最新模型表现优于其他模型。进一步的分析表明，使用带有3D ConvNets的残差帧可以提取更好的运动特征，并且我们的残差帧输入路径是现有RGB帧输入模型的良好补充。

著录项

来源
《マルチメディアストレージ研究会;メディア工学研究会;ヒューマンインフォメーション研究会;映像表現コンピュータグラフィックス研究会;ITS研究会;画像工学研究会》|2020年|227-232|共6页
会议地点
作者
Li TAO; Xueting WANG; Toshihiko YAMASAKI;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Motion representation; action recognition; residual frames; 3D ConvNets;

机译：运动表现;动作识别;剩余帧3D ConvNets;

相似文献

外文文献
中文文献
专利

1. Representing Motion Features Using Residual Frames with 3D ConvNets for Action Recognition [J] . 電子情報通信学会技術研究報告. ITS. Intelligent Transport Systems Technology . 2019,第421期

机译：表示使用带有3D Cummnetet的剩余帧进行动作识别的运动功能
2. A Review of Dynamic Maps for 3D Human Motion Recognition Using ConvNets and Its Improvement [J] . Zhimin Gao, Pichao Wang, Huogen Wang, Neural processing letters . 2020,第2期

机译：探讨与改进的3D人体运动识别动态地图述评
3. 3D skeleton-based action recognition by representing motion capture sequences as 2D-RGB images [J] . Laraba Sohaib, Brahimi Mohammed, Tilmanne Joelle, Computer Animation and Virtual Worlds . 2017,第3a4期

机译：通过将运动捕捉序列表示为2D-RGB图像，实现基于3D骨骼的动作识别
4. Action Recognition Based on Motion Representing and Reconstructed Phase Spaces Matching of 3D Joint Positions [C] . Yantao Zhao, Bo Zhang, Xuguang Zhang, International Conference on Measurement, Instrumentation and Automation . 2013

机译：基于运动代表和重建相空间的动作识别与3D关节位置匹配
5. Representing signals using only timing information and feature extraction for automatic speech recognition. [D] . Wang, Yadong. 2003

机译：仅使用时序信息和特征提取来表示信号，即可进行自动语音识别。
6. AR3D: Attention Residual 3D Network for Human Action Recognition [O] . Min Dong, Zhenglin Fang, Yongfa Li, 2021

机译：AR3D：注意人类行动识别的残留3D网络
7. 3D Facial Feature Extraction and Recognition. An investigation of 3D face recognition: correction and normalisation of the facial data, extraction of facial features and classification using machine learning techniques. [O] . Al-Qatawneh Sokyna M.S. 2010

机译：3D面部特征提取和识别。 3D人脸识别研究：人脸数据的校正和规范化，人脸特征的提取以及使用机器学习技术的分类。

Representing Motion Features Using Residual Frames with 3D ConvNets for Action Recognition

摘要

著录项

相似文献

相关主题

期刊订阅