首页> 外文期刊>IEEE transactions on multimedia >Attention-Based Multiview Re-Observation Fusion Network for Skeletal Action Recognition
【24h】

Attention-Based Multiview Re-Observation Fusion Network for Skeletal Action Recognition

机译:基于注意力的多视点再观察融合网络的骨骼动作识别

获取原文
获取原文并翻译 | 示例

摘要

Action recognition is an important and popular area in computer vision. Because of the helpfulness of action recognition of the skeleton and the development of related pose estimation techniques, action recognition based on skeleton data has drawn considerable attention and has been widely studied in recent years. In this paper, we propose an attention-based multiview re-observation fusion model for skeletal action recognition. The proposed model focuses on the factor of observation view of actions, which greatly influences action recognition. The model utilizes action information from multiple observation views to improve the recognition performance. In this method, we reobserve input skeleton data from several possible viewpoints, process these augmented observation data with a long short-term memory (LSTM) network separately, and, finally, fuse the outputs to generate the final recognition result. In the multiview fusion process, an attention mechanism is applied to regulate the fusion operation according to the helpfulness for the recognition of all views. In this way, the model can fuse information from multiple viewpoints to recognize actions and can learn to evaluate observation views to improve fusion performance. We also propose a multilayer feature attention method to improve the performance of the LSTM in our model. We utilize an attention mechanism to enhance the feature expression by finding and focusing on informative feature dimensions according to contextual action information. Moreover, we propose stacking multiple layers of attention operation in a multilayer LSTM network to further improve network performance. The final model is integrated into an end-to-end trainable network. Experiments conducted on two popular datasets, NTU RGB+D and SBU Kinect interaction, show that our model achieves state-of-the-art performance.
机译:动作识别是计算机视觉中一个重要且受欢迎的领域。由于骨骼的动作识别的帮助和相关姿态估计技术的发展,基于骨骼数据的动作识别已经引起了广泛的关注,并且近年来受到了广泛的研究。在本文中,我们提出了一种用于骨骼动作识别的基于注意力的多视图重新观察融合模型。提出的模型侧重于对动作观察的观察因素,这极大地影响了动作的识别。该模型利用来自多个观察视图的动作信息来提高识别性能。在这种方法中,我们从几个可能的角度重新观察输入的骨架数据,分别使用长短期记忆(LSTM)网络分别处理这些增强的观察数据,最后融合输出以生成最终的识别结果。在多视图融合过程中,根据对识别所有视图的帮助,应用了一种注意力机制来调节融合操作。这样,模型可以融合来自多个视点的信息以识别动作,并可以学习评估观察视图以改善融合性能。我们还提出了一种多层特征关注方法,以提高模型中LSTM的性能。我们利用关注机制通过根据上下文动作信息查找并关注信息量大的维度来增强特征表达。此外,我们建议在多层LSTM网络中堆叠多层注意操作,以进一步提高网络性能。最终模型已集成到端到端可训练网络中。在两个受欢迎的数据集NTU RGB + D和SBU Kinect交互作用上进行的实验表明,我们的模型实现了最新的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号