首页> 外文期刊>Computer vision and image understanding >Exploiting deep residual networks for human action recognition from skeletal data
【24h】

Exploiting deep residual networks for human action recognition from skeletal data

机译:利用深度残差网络从骨骼数据中识别人类动作

获取原文
获取原文并翻译 | 示例

摘要

The computer vision community is currently focusing on solving action recognition problems in real videos, which contain thousands of samples with many challenges. In this process, Deep Convolutional Neural Networks (D-CNNs) have played a significant role in advancing the state-of-the-art in various vision-based action recognition systems. Recently, the introduction of residual connections in conjunction with a more traditional CNN model in a single architecture called Residual Network (ResNet) has shown impressive performance and great potential for image recognition tasks. In this paper, we investigate and apply deep ResNets for human action recognition using skeletal data provided by depth sensors. Firstly, the 3D coordinates of the human body joints carried in skeleton sequences are transformed into image-based representations and stored as RGB images. These color images are able to capture the spatial-temporal evolutions of 3D motions from skeleton sequences and can be efficiently learned by D-CNNs. We then propose a novel deep learning architecture based on ResNets to learn features from obtained color-based representations and classify them into action classes. The proposed method is evaluated on three challenging benchmark datasets including MSR Action 3D, KARD, and NTU-RGB + D datasets. Experimental results demonstrate that our method achieves state-of-the-art performance for all these benchmarks whilst requiring less computation resource. In particular, the proposed method surpasses previous approaches by a significant margin of 3.4% on MSR Action 3D dataset, 0.67% on KARD dataset, and 2.5% on NTU-RGB +D dataset.
机译:计算机视觉社区当前致力于解决真实视频中的动作识别问题,其中包含成千上万个具有许多挑战的样本。在此过程中,深度卷积神经网络(D-CNN)在推动各种基于视觉的动作识别系统中的最新技术方面发挥了重要作用。最近,在称为“残差网络(ResNet)”的单个体系结构中引入残差连接和更传统的CNN模型已显示出令人印象深刻的性能和图像识别任务的巨大潜力。在本文中,我们调查深度ResNets并将其应用于使用深度传感器提供的骨骼数据进行的人类动作识别。首先,将骨骼序列中携带的人体关节的3D坐标转换为基于图像的表示形式并存储为RGB图像。这些彩色图像能够从骨架序列捕获3D运动的时空演变,并且可以由D-CNN有效地学习。然后,我们提出一种基于ResNets的新型深度学习架构,以从获得的基于颜色的表示中学习特征并将其分类为动作类。在三个具有挑战性的基准数据集(包括MSR Action 3D,KARD和NTU-RGB + D数据集)上对提出的方法进行了评估。实验结果表明,我们的方法在所有这些基准测试中均达到了最先进的性能,同时所需的计算资源更少。特别地,所提出的方法在MSR Action 3D数据集上,在KARD数据集上为0.67%,在NTU-RGB + D数据集上为2.5%,远远超过了以前的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号