首页> 外文会议>Asian Conference on Computer Vision >Spatio-Temporal Fusion Networks for Action Recognition
【24h】

Spatio-Temporal Fusion Networks for Action Recognition

机译:时空融合网络的动作识别

获取原文

摘要

The video based CNN works have focused on effective ways to fuse appearance and motion networks, but they typically lack utilizing temporal information over video frames. In this work, we present a novel spatio-temporal fusion network (STFN) that integrates temporal dynamics of appearance and motion information from entire videos. The captured temporal dynamic information is then aggregated for a better video level representation and learned via end-to-end training. The spatio-temporal fusion network consists of two set of Residual Inception blocks that extract temporal dynamics and a fusion connection for appearance and motion features. The benefits of STFN are: (a) it captures local and global temporal dynamics of complementary data to learn video-wide information; and (b) it is applicable to any network for video classification to boost performance. We explore a variety of design choices for STFN and verify how the network performance is varied with the ablation studies. We perform experiments on two challenging human activity datasets, UCF101 and HMDB51, and achieve the state-of-the-art results with the best network.
机译:基于视频的CNN工作集中在融合外观和运动网络的有效方法上,但是它们通常缺乏在视频帧上利用时间信息的能力。在这项工作中,我们提出了一种新颖的时空融合网络(STFN),该网络整合了来自整个视频的外观和运动信息的时间动态。然后,将捕获的时间动态信息进行汇总,以获得更好的视频级别表示,并通过端到端训练进行学习。时空融合网络由两套残余初始块组成,它们提取了时间动态特性以及用于外观和运动特征的融合连接。 STFN的好处是:(a)它捕获补充数据的局部和全局时间动态,以学习视频范围的信息; (b)适用于任何视频分类网络以提高性能。我们探索STFN的多种设计选择,并通过消融研究验证网络性能如何变化。我们在两个具有挑战性的人类活动数据集UCF101和HMDB51上进行了实验,并以最佳的网络获得了最新的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号