首页> 美国卫生研究院文献>Sensors (Basel Switzerland) >AR3D: Attention Residual 3D Network for Human Action Recognition
【2h】

AR3D: Attention Residual 3D Network for Human Action Recognition

机译:AR3D:注意人类行动识别的残留3D网络

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

At present, in the field of video-based human action recognition, deep neural networks are mainly divided into two branches: the 2D convolutional neural network (CNN) and 3D CNN. However, 2D CNN’s temporal and spatial feature extraction processes are independent of each other, which means that it is easy to ignore the internal connection, affecting the performance of recognition. Although 3D CNN can extract the temporal and spatial features of the video sequence at the same time, the parameters of the 3D model increase exponentially, resulting in the model being difficult to train and transfer. To solve this problem, this article is based on 3D CNN combined with a residual structure and attention mechanism to improve the existing 3D CNN model, and we propose two types of human action recognition models (the Residual 3D Network (R3D) and Attention Residual 3D Network (AR3D)). Firstly, in this article, we propose a shallow feature extraction module and improve the ordinary 3D residual structure, which reduces the parameters and strengthens the extraction of temporal features. Secondly, we explore the application of the attention mechanism in human action recognition and design a 3D spatio-temporal attention mechanism module to strengthen the extraction of global features of human action. Finally, in order to make full use of the residual structure and attention mechanism, an Attention Residual 3D Network (AR3D) is proposed, and its two fusion strategies and corresponding model structure (AR3D_V1, AR3D_V2) are introduced in detail. Experiments show that the fused structure shows different degrees of performance improvement compared to a single structure.
机译:目前,在基于视频的人体动作识别领域,深神经网络主要分为两个分支:2D卷积神经网络(CNN)和3D CNN。然而,2D CNN的时间和空间特征提取过程彼此独立,这意味着易于忽略内部连接,影响识别性能。尽管3D CNN可以同时提取视频序列的时间和空间特征,但是3D模型的参数呈指数增加,导致模型难以训练和转移。为了解决这个问题,本文基于3D CNN结合了剩余结构和注意机制来改进现有的3D CNN模型,并提出了两种类型的人类动作识别模型(残差3D网络(R3D)和注意力残余3D网络(AR3D))。首先,在本文中,我们提出了一个浅特色提取模块,提高了普通的3D残余结构,这减少了参数并增强了时间特征的提取。其次,我们探讨了注意力机制在人类行动识别和设计3D时空关注机制模块中的应用,以加强人类行为的全球特征的提取。最后,为了充分利用残余结构和注意机制,提出了注意力残余3D网络(AR3D),并详细介绍了其两个融合策略和相应的模型结构(AR3D_V1,AR3D_V2)。实验表明,与单一结构相比,熔融结构显示不同的性能改善程度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号