首页> 外文期刊>Quality Control, Transactions >Deep Neural Networks Using Capsule Networks and Skeleton-Based Attentions for Action Recognition
【24h】

Deep Neural Networks Using Capsule Networks and Skeleton-Based Attentions for Action Recognition

机译:深度神经网络,使用胶囊网络和基于骨架的行动识别的关注

获取原文
获取原文并翻译 | 示例
           

摘要

This work develops Deep Neural Networks (DNNs) by adopting Capsule Networks (CapsNets) and spatiotemporal skeleton-based attention to effectively recognize subject actions from abundant spatial and temporal contexts of videos. The proposed generic DNN includes four 3D Convolutional Neural Networks (3D_CNNs), Attention-Jointed Appearance (AJA) and Attention-Jointed Motion (AJM) generation layers, two Reduction Layers (RLs), two Attention-based Recurrent Neural Networks (A_RNNs), and an inference classifier, where RGB, transformed skeleton, and optical-flow channel streams are inputs. The AJA and AJM generation layers emphasize skeletons to the appearances and motions of a subject, respectively. A_RNNs generate attention weights over time steps to highlight rich temporal contexts. To integrate CapsNets in this generic DNN, three types of CapsNet-based DNNs are devised, where the CapsNets take over a classifier, A_RNN+classifier, and RL+A_RNN+classifier. The experimental results reveal that the proposed DNN using CapsNet as an inference classifier outperforms the other two CapsNet-based DNNs and the generic DNN adopting the feedforward neural network as an inference classifier. Additionally, our best CapsNet-based DNN achieves average accuracies of 98.5% for the state-of-the-art performance in UCF101, 82.1% for near-state-of-the-art performance in HMDB51, and 95.3% for panoramic videos, to the best of our knowledge. Particularly, it is determined that the generic CapsNet behaves as an outstanding inference classifier but is slightly worse than the A_RNN in interpreting temporal evidence for recognition. Therefore, the proposed DNN, which employs CapsNet to fulfill an inference classifier, can be superiorly applied to various context-aware visual applications.
机译:这项工作通过采用胶囊网络(Capsnets)和基于时空骨架的注意力来实现深度神经网络(DNN),从而有效地识别来自视频的丰富空间和时间上下文的主题行动。所提出的通用DNN包括四个3D卷积神经网络(3D_CNNS),关注关节外观(AJA)和注意力接合运动(AJM)生成层,两个减少层(RLS),基于关注的经常性神经网络(A_RNNS),和推断分类器,其中RGB,变换的骨架和光流信道流是输入的。 AJA和AJM生成层分别强调骷髅分别对受试者的外表和运动。 A_RNNS会随着时间的步骤产生注意力,以突出显示丰富的时间上下文。要将CapSnet集成到此通用DNN中,设计了三种类型的基于帽的DNN,其中封装盒占用分类器,A_RNN +分类器和RL + A_RNN +分类器。实验结果表明,所提出的DNN使用帽作为推理分类器优于其他两个基于帽的DNN和采用前馈神经网络的通用DNN作为推理分类器。此外,我们最佳的基于CAPSNET的DNN在UCF101中的最先进性能下实现了98.5%的平均精度,在HMDB51中近最先进的性能,82.1%,全景视频为95.3%,据我们所知。特别地,确定通用帽表现为出色的推理分类器,但略差于解释识别的时间证据中的A_RNN。因此,采用CAPSNet符合推理分类器的提议DNN可以优于各种上下文感知视觉应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号