首页> 外文会议>Asian Conference on Computer Vision >Spatio-Temporal Fusion Networks for Action Recognition
【24h】

Spatio-Temporal Fusion Networks for Action Recognition

机译:用于行动识别的时空融合网络

获取原文

摘要

The video based CNN works have focused on effective ways to fuse appearance and motion networks, but they typically lack utilizing temporal information over video frames. In this work, we present a novel spatio-temporal fusion network (STFN) that integrates temporal dynamics of appearance and motion information from entire videos. The captured temporal dynamic information is then aggregated for a better video level representation and learned via end-to-end training. The spatio-temporal fusion network consists of two set of Residual Inception blocks that extract temporal dynamics and a fusion connection for appearance and motion features. The benefits of STFN are: (a) it captures local and global temporal dynamics of complementary data to learn video-wide information; and (b) it is applicable to any network for video classification to boost performance. We explore a variety of design choices for STFN and verify how the network performance is varied with the ablation studies. We perform experiments on two challenging human activity datasets, UCF101 and HMDB51, and achieve the state-of-the-art results with the best network.
机译:基于视频的CNN工程专注于熔断器外观和运动网络的有效方法,而是通常缺少在视频帧上使用时间信息。在这项工作中,我们提出了一种新颖的时空融合网络(STFN),它集成了整个视频的外观和运动信息的时间动态。然后聚合捕获的时间动态信息以获得更好的视频级表示,并通过端到端培训学习。时空融合网络由两组残差块组成,提取时间动态和用于外观和运动功能的融合连接。 STFN的好处是:(a)它捕获了互补数据的本地和全局时间动态,以学习视频范围信息; (b)适用于任何用于促进性能的视频分类网络。我们探索STFN的各种设计选择,并验证网络性能如何随着消融研究而变化。我们对两个具有挑战性的人类活动数据集,UCF101和HMDB51进行实验,并通过最佳网络实现最先进的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号