...
首页> 外文期刊>Neural computing & applications >Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition
【24h】

Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition

机译:混合和分层融合网络:行动识别的深度跨模型学习架构

获取原文
获取原文并翻译 | 示例
           

摘要

Two-stream networks have provided an alternate way of exploiting the spatiotemporal information for action recognition problem. Nevertheless, most of the two-stream variants perform the fusion of homogeneous modalities which cannot efficiently capture the action-motion dynamics from the videos. Moreover, the existing studies cannot extend the streams beyond the number of modalities. To address these limitations, we propose a hybrid and hierarchical fusion (HHF) networks. The hybrid fusion handles non-homogeneous modalities and introduces a cross-modal learning stream for effective modeling of motion dynamics while extending the networks from existing two-stream variants to three and six streams. On the other hand, the hierarchical fusion makes the modalities consistent by modeling long-term temporal information along with the combination of multiple streams to improve the recognition performance. The proposed network architecture comprises of three fusion tiers: the hybrid fusion itself, the long-term fusion pooling layer which models the long-term dynamics from RGB and optical flow modalities, and the adaptive weighting scheme for combining the classification scores from several streams. We show that the hybrid fusion has different representations from the base modalities for training the cross-modal learning stream. We have conducted extensive experiments and shown that the proposed six-stream HHF network outperforms the existing two- and four-stream networks, achieving the state-of-the-art recognition performance, 97.2% and 76.7% accuracies on UCF101 and HMDB51 datasets, respectively, which are widely used in action recognition studies.
机译:双流网络提供了利用行动识别问题的时空信息的替代方法。然而,大多数两流变型都能执行均匀模型的融合,其无法有效地捕获来自视频的动作动态。此外,现有研究不能将流延伸到超出数量的数量。为了解决这些限制,我们提出了一个混合和分层融合(HHF)网络。混合融合处理非均匀模型,并引入跨模型学习流,用于有效建模运动动态,同时从现有的两流变体扩展到三个和六个流。另一方面,分层融合使模态通过模拟长期时间信息以及多个流的组合来提高识别性能的组合来实现。所提出的网络架构包括三个融合层:混合融合本身,长期融合池从RGB和光学流量模拟模拟长期动态的长期融合层,以及用于将分类得分与几个流组合的自适应加权方案。我们表明,混合融合具有来自基础方式的不同表示,用于训练跨模型学习流。我们已经进行了广泛的实验,并表明提出的六流HHF网络优于现有的二流网络,实现了最先进的识别性能,在UCF101和HMDB51数据集中获得了最先进的识别性能,97.2%和76.7%。分别是广泛用于动作识别研究的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号