Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition

首页> 外文期刊>Neural computing & applications >Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition

【24h】

Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition

机译：混合和分层融合网络：行动识别的深度跨模型学习架构

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Two-stream networks have provided an alternate way of exploiting the spatiotemporal information for action recognition problem. Nevertheless, most of the two-stream variants perform the fusion of homogeneous modalities which cannot efficiently capture the action-motion dynamics from the videos. Moreover, the existing studies cannot extend the streams beyond the number of modalities. To address these limitations, we propose a hybrid and hierarchical fusion (HHF) networks. The hybrid fusion handles non-homogeneous modalities and introduces a cross-modal learning stream for effective modeling of motion dynamics while extending the networks from existing two-stream variants to three and six streams. On the other hand, the hierarchical fusion makes the modalities consistent by modeling long-term temporal information along with the combination of multiple streams to improve the recognition performance. The proposed network architecture comprises of three fusion tiers: the hybrid fusion itself, the long-term fusion pooling layer which models the long-term dynamics from RGB and optical flow modalities, and the adaptive weighting scheme for combining the classification scores from several streams. We show that the hybrid fusion has different representations from the base modalities for training the cross-modal learning stream. We have conducted extensive experiments and shown that the proposed six-stream HHF network outperforms the existing two- and four-stream networks, achieving the state-of-the-art recognition performance, 97.2% and 76.7% accuracies on UCF101 and HMDB51 datasets, respectively, which are widely used in action recognition studies.

机译：双流网络提供了利用行动识别问题的时空信息的替代方法。然而，大多数两流变型都能执行均匀模型的融合，其无法有效地捕获来自视频的动作动态。此外，现有研究不能将流延伸到超出数量的数量。为了解决这些限制，我们提出了一个混合和分层融合（HHF）网络。混合融合处理非均匀模型，并引入跨模型学习流，用于有效建模运动动态，同时从现有的两流变体扩展到三个和六个流。另一方面，分层融合使模态通过模拟长期时间信息以及多个流的组合来提高识别性能的组合来实现。所提出的网络架构包括三个融合层：混合融合本身，长期融合池从RGB和光学流量模拟模拟长期动态的长期融合层，以及用于将分类得分与几个流组合的自适应加权方案。我们表明，混合融合具有来自基础方式的不同表示，用于训练跨模型学习流。我们已经进行了广泛的实验，并表明提出的六流HHF网络优于现有的二流网络，实现了最先进的识别性能，在UCF101和HMDB51数据集中获得了最先进的识别性能，97.2％和76.7％。分别是广泛用于动作识别研究的。

著录项

来源
《Neural computing & applications》 |2020年第14期|共12页
作者

展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类人工神经网络计算机;人工智能理论;
关键词
Action recognition; Deep architectures; Inception-ResNets; Video representations; Non-homogeneous fusion;

机译：行动识别;深层架构;初始化 - reasnets;视频表示;非均匀融合;

相似文献

外文文献
中文文献
专利

1. Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition [J] . Neural computing & applications . 2020,第14期

机译：混合和分层融合网络：行动识别的深度跨模型学习架构
2. Deep learning architecture for iris recognition based on optimal Gabor filters and deep belief network [J] . He Fei, Han Ye, Wang Han, Journal of electronic imaging . 2017,第2期

机译：基于最佳Gabor滤波器和深度置信网络的虹膜识别深度学习架构
3. Implementation of multimodal biometric recognition via multi-feature deep learning networks and feature fusion [J] . Tiong Leslie Ching Ow, Kim Seong Tae, Ro Yong Man Multimedia Tools and Applications . 2019,第16期

机译：通过多特征深度学习网络和特征融合实现多模式生物识别
4. Fusion of deep learning architectures, multilayer feedforward networks and learning vector quantizers for deep classification learning [C] . T. Villmann, M. Biehl, A. Villmann, 2017 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization . 2017

机译：深度学习架构，多层前馈网络和用于深度分类学习的学习矢量量化器的融合
5. Multimodal Data Creation, Fusion, and Recognition of Action Units Using Deep Learning Models [D] . Zhang, Zheng. 2020

机译：使用深度学习模型的多模式数据创建，融合和识别行动单位
6. Transfer of Learning from Vision to Touch: A Hybrid Deep Convolutional Neural Network for Visuo-Tactile 3D Object Recognition [O] . Ghazal Rouhafzay, Ana-Maria Cretu, Pierre Payeur 2021

机译：从愿景转移到触摸：用于Visoo-Tactive 3D对象识别的混合深卷积神经网络
7. CCL: Cross-modal Correlation Learning with Multi-grained Fusion by Hierarchical Network [O] . Peng, Yuxin, Qi, Jinwei, Huang, Xin, 2017

机译：CCL：多模式融合的跨模态相关学习分层网络
8. Hierarchical Neural Network (HNN) for Closed Loop Decision Making: Designing the Architecture of a Hierarchical Neural Network to Model Attention, Learning and Goal Oriented Behavior. [R] . Guez, A. 1990

机译：用于闭环决策的分层神经网络（HNN）：设计层次神经网络的体系结构以模拟注意，学习和目标导向行为。

Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition

摘要

著录项

相似文献

相关主题

期刊订阅