首页> 外文会议>Chinese conference pattern recognition and computer vision >Multi-level Three-Stream Convolutional Networks for Video-Based Action Recognition
【24h】

Multi-level Three-Stream Convolutional Networks for Video-Based Action Recognition

机译:基于视频的动作识别的多层三流卷积网络

获取原文
获取外文期刊封面目录资料

摘要

Deep convolutional neural networks (ConvNets) have shown remarkable capability for visual feature learning and representation. In the field of video-based action recognition, much progress has been made with the development of ConvNets. However, main-stream ConvNets used for video-based action recognition, such as two-stream ConvNets and 3D ConvNets, still lack the ability to represent fine-grained features. In this paper, we propose a novel architecture named multi-level three-stream convolutional network (MLTSN), which contains three streams, i.e., the spatial stream, the temporal stream, and the multi-level correlation stream (MLCS). The MLCS contains several correlation modules, which fuse appearance and motion features at the same levels and obtain spatial-temporal correlation maps. The correlation maps will further be fed in several convolution layers to get refined features. The whole network is trained in a multi-step modality. Extensive experimental results show that the performance of the proposed network is competitive to state-of-the-art methods on HMDB51 and UCF101.
机译:深度卷积神经网络(ConvNets)已显示出显着的视觉特征学习和表示能力。在基于视频的动作识别领域,随着ConvNets的发展,已经取得了很大的进步。但是,用于基于视频的动作识别的主流ConvNet,例如两流ConvNet和3D ConvNet,仍然缺乏表示细粒度功能的能力。在本文中,我们提出了一种新颖的架构,称为多级三流卷积网络(MLTSN),其中包含三个流,即空间流,时间流和多级相关流(MLCS)。 MLCS包含几个相关模块,这些模块将外观和运动特征融合在同一级别,并获得时空相关图。相关图将进一步馈入几个卷积层中,以得到精炼的特征。整个网络以多步骤的方式进行训练。大量的实验结果表明,所提出的网络的性能与HMDB51和UCF101上的最新方法相比具有竞争优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号