首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition
【24h】

View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition

机译:查看适用于基于骨骼的高性能人体动作识别的自适应神经网络

获取原文
获取原文并翻译 | 示例
           

摘要

Skeleton-based human action recognition has recently attracted increasing attention thanks to the accessibility and the popularity of 3D skeleton data. One of the key challenges in action recognition lies in the large variations of action representations when they are captured from different viewpoints. In order to alleviate the effects of view variations, this paper introduces a novel view adaptation scheme, which automatically determines the virtual observation viewpoints over the course of an action in a learning based data driven manner. Instead of re-positioning the skeletons using a fixed human-defined prior criterion, we design two view adaptive neural networks, i.e., VA-RNN and VA-CNN, which are respectively built based on the recurrent neural network (RNN) with the Long Short-term Memory (LSTM) and the convolutional neural network (CNN). For each network, a novel view adaptation module learns and determines the most suitable observation viewpoints, and transforms the skeletons to those viewpoints for the end-to-end recognition with a main classification network. Ablation studies find that the proposed view adaptive models are capable of transforming the skeletons of various views to much more consistent virtual viewpoints. Therefore, the models largely eliminate the influence of the viewpoints, enabling the networks to focus on the learning of action-specific features and thus resulting in superior performance. In addition, we design a two-stream scheme (referred to as VA-fusion) that fuses the scores of the two networks to provide the final prediction, obtaining enhanced performance. Moreover, random rotation of skeleton sequences is employed to improve the robustness of view adaptation models and alleviate overfitting during training. Extensive experimental evaluations on five challenging benchmarks demonstrate the effectiveness of the proposed view-adaptive networks and superior performance over state-of-the-art approaches.
机译:由于3D骨架数据的可访问性和普及性,基于骨架的人体动作识别最近引起了越来越多的关注。动作识别中的主要挑战之一是从不同的角度捕获动作表示时的动作表示形式存在很大差异。为了减轻视图变化的影响,本文介绍了一种新颖的视图自适应方案,该方案以基于学习的数据驱动方式在动作过程中自动确定虚拟观察点。我们没有使用固定的人工定义的先验准则来重新定位骨骼,而是设计了两个视图自适应神经网络,即VA-RNN和VA-CNN,它们分别基于带有Long的递归神经网络(RNN)构建短期记忆(LSTM)和卷积神经网络(CNN)。对于每个网络,一个新颖的视图适应模块将学习并确定最合适的观察视点,并将骨架转换为这些视点,以便通过主分类网络进行端到端识别。消融研究发现,提出的视图自适应模型能够将各种视图的骨架转换为更加一致的虚拟视点。因此,这些模型在很大程度上消除了观点的影响,使网络能够专注于特定于动作的特征的学习,从而获得卓越的性能。此外,我们设计了两个流方案(称为VA融合),该方案融合了两个网络的得分以提供最终预测,从而获得增强的性能。此外,采用骨骼序列的随机旋转来提高视图适应模型的鲁棒性并减轻训练期间的过度拟合。在五个具有挑战性的基准上进行的广泛实验评估证明了所提出的视图自适应网络的有效性以及优于最新方法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号