首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >PoTion: Pose MoTion Representation for Action Recognition
【24h】

PoTion: Pose MoTion Representation for Action Recognition

机译:PoTion:用于动作识别的姿势运动表示

获取原文

摘要

Most state-of-the-art methods for action recognition rely on a two-stream architecture that processes appearance and motion independently. In this paper, we claim that considering them jointly offers rich information for action recognition. We introduce a novel representation that gracefully encodes the movement of some semantic keypoints. We use the human joints as these keypoints and term our Pose moTion representation PoTion. Specifically, we first run a state-of-the-art human pose estimator [4] and extract heatmaps for the human joints in each frame. We obtain our PoTion representation by temporally aggregating these probability maps. This is achieved by 'colorizing' each of them depending on the relative time of the frames in the video clip and summing them. This fixed-size representation for an entire video clip is suitable to classify actions using a shallow convolutional neural network. Our experimental evaluation shows that PoTion outperforms other state-of-the-art pose representations [6, 48]. Furthermore, it is complementary to standard appearance and motion streams. When combining PoTion with the recent two-stream I3D approach [5], we obtain state-of-the-art performance on the JHMDB, HMDB and UCF101 datasets.
机译:大多数用于动作识别的最先进方法都依赖于独立处理外观和动作的两流体系结构。在本文中,我们声称共同考虑它们可以为动作识别提供丰富的信息。我们介绍了一种新颖的表示形式,它优雅地编码了一些语义关键点的运动。我们将人体关节用作这些关键点,并称其为“姿态运动表示” PoTion。具体来说,我们首先运行最先进的人体姿势估计器[4]并提取每一帧中人体关节的热图。我们通过时间上聚合这些概率图来获得PoTion表示。这是通过根据视频片段中帧的相对时间对每个帧进行“着色”并对其求和来实现的。整个视频剪辑的固定大小表示形式适合使用浅层卷积神经网络对动作进行分类。我们的实验评估表明,PoTion的表现优于其他最新的姿势表示法[6,48]。此外,它是标准外观和运动流的补充。当结合PoTion和最新的两流I3D方法[5]时,我们获得了JHMDB,HMDB和UCF101数据集的最新性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号