首页> 外文会议>IEEE Conference on Computer Vision and Pattern Recognition >Scene Flow to Action Map: A New Representation for RGB-D Based Action Recognition with Convolutional Neural Networks
【24h】

Scene Flow to Action Map: A New Representation for RGB-D Based Action Recognition with Convolutional Neural Networks

机译:场景流到动作图:基于卷积神经网络的基于RGB-D的动作识别的新表示

获取原文
获取外文期刊封面目录资料

摘要

Scene flow describes the motion of 3D objects in real world and potentially could be the basis of a good feature for 3D action recognition. However, its use for action recognition, especially in the context of convolutional neural networks (ConvNets), has not been previously studied. In this paper, we propose the extraction and use of scene flow for action recognition from RGB-D data. Previous works have considered the depth and RGB modalities as separate channels and extract features for later fusion. We take a different approach and consider the modalities as one entity, thus allowing feature extraction for action recognition at the beginning. Two key questions about the use of scene flow for action recognition are addressed: how to organize the scene flow vectors and how to represent the long term dynamics of videos based on scene flow. In order to calculate the scene flow correctly on the available datasets, we propose an effective self-calibration method to align the RGB and depth data spatially without knowledge of the camera parameters. Based on the scene flow vectors, we propose a new representation, namely, Scene Flow to Action Map (SFAM), that describes several long term spatio-temporal dynamics for action recognition. We adopt a channel transform kernel to transform the scene flow vectors to an optimal color space analogous to RGB. This transformation takes better advantage of the trained ConvNets models over ImageNet. Experimental results indicate that this new representation can surpass the performance of state-of-the-art methods on two large public datasets.
机译:场景流描述了现实世界中3D对象的运动,并且可能是3D动作识别的良好功能的基础。但是,以前尚未研究将其用于动作识别,尤其是在卷积神经网络(ConvNets)的情况下。在本文中,我们提出了从RGB-D数据中提取和使用场景流进行动作识别的方法。以前的工作已将深度和RGB模式视为单独的通道,并提取了特征以供以后融合。我们采用不同的方法,并将这些模式视为一个实体,因此从一开始就允许特征提取以进行动作识别。解决了有关使用场景流进行动作识别的两个关键问题:如何组织场景流向量以及如何基于场景流表示视频的长期动态。为了在可用数据集上正确计算场景流,我们提出了一种有效的自校准方法,可以在不了解相机参数的情况下在空间上对齐RGB和深度数据。基于场景流向量,我们提出了一种新的表示形式,即“场景流到动作图”(SFAM),它描述了一些用于动作识别的长期时空动态。我们采用通道变换内核将场景流矢量变换为类似于RGB的最佳色彩空间。与ImageNet相比,这种转换更好地利用了经过训练的ConvNets模型。实验结果表明,这种新的表示形式可以在两个大型公共数据集上超越最新技术的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号