【24h】

Representing Motion Features Using Residual Frames with 3D ConvNets for Action Recognition

机译:使用带有3D ConvNets的残差帧表示动作特征的动作特征

获取原文

摘要

Recently, 3D convolutional networks (3D ConvNets) yield good performance in action recognition. However, optical flow stream is still needed to ensure better performance, the cost of which is very high. In this paper, we propose a fast but effective way to extract motion features from videos utilizing residual frames as the input data in 3D ConvNets. By replacing traditional stacked RGB frames with residual ones, 14.5% and 12.5% points improvements over top-1 accuracy can be achieved on the UCF101 and HMDB51 datasets when trained from scratch. Because residual frames contain little information of object appearance, we further use a 2D convolutional network to extract appearance features and combine them with the results from residual frames to form a two-path solution. In three benchmark datasets, our two-path solution achieved better or comparable performances than those using additional optical flow methods, especially outperformed the state-of-the-art models on Mini-kinetics dataset. Further analysis indicates that better motion features can be extracted using residual frames with 3D ConvNets, and our residual-frame-input path is a good supplement for existing RGB-frame-input models.
机译:最近,3D卷积网络(3D ConvNets)在动作识别方面表现出良好的性能。然而,仍然需要光流来确保更好的性能,其成本非常高。在本文中,我们提出了一种快速而有效的方法,该方法利用残差帧作为3D ConvNets中的输入数据从视频中提取运动特征。通过用残差帧替换传统的堆叠RGB帧,从头开始训练时,UCF101和HMDB51数据集的精度可比top-1精度提高14.5%和12.5%。由于残差帧包含的对象外观信息很少,因此我们进一步使用2D卷积网络来提取外观特征,并将其与残差帧的结果结合起来以形成两路径解决方案。在三个基准数据集中,我们的双路径解决方案比使用其他光流方法的性能更好或更可比,特别是在Mini-Kinetics数据集上的最新模型表现优于其他模型。进一步的分析表明,使用带有3D ConvNets的残差帧可以提取更好的运动特征,并且我们的残差帧输入路径是现有RGB帧输入模型的良好补充。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号