首页> 外文会议>AAAI Conference on Artificial Intelligence >3D Human Pose Estimation Using Spatio-Temporal Networks with Explicit Occlusion Training
【24h】

3D Human Pose Estimation Using Spatio-Temporal Networks with Explicit Occlusion Training

机译:3D人类姿势估计使用具有显式遮挡训练的时空网络

获取原文

摘要

Estimating 3D poses from a monocular video is still a challenging task, despite the significant progress that has been made in the recent years. Generally, the performance of existing methods drops when the target person is too small/large, or the motion is too fast/slow relative to the scale and speed of the training data. Moreover, to our knowledge, many of these methods are not designed or trained under severe occlusion explicitly, making their performance on handling occlusion compromised. Addressing these problems, we introduce a spatio-temporal network for robust 3D human pose estimation. As humans in videos may appear in different scales and have various motion speeds, we apply multi-scale spatial features for 2D joints or keypoints prediction in each individual frame, and multi-stride temporal convolutional networks (TCNs) to estimate 3D joints or keypoints. Furthermore, we design a spatio-temporal discriminator based on body structures as well as limb motions to assess whether the predicted pose forms a valid pose and a valid movement. During training, we explicitly mask out some keypoints to simulate various occlusion cases, from minor to severe occlusion, so that our network can learn better and becomes robust to various degrees of occlusion. As there are limited 3D ground truth data, we further utilize 2D video data to inject a semi-supervised learning capability to our network. Experiments on public data sets validate the effectiveness of our method, and our ablation studies show the strengths of our network's individual submodules.
机译:尽管近年来已经取得了重大进展,但估算来自单眼视频的3D姿势仍然是一个具有挑战性的任务。通常,当目标人太小/大时,现有方法的性能降低,或者相对于训练数据的比例和速度,运动太快/速度太快。此外,对于我们的知识,许多这些方法没有明确地在严重的遮挡下设计或培训,使其在处理遮挡受损的情况下的性能。解决这些问题,我们为强大的3D人类姿势估算介绍了一种时空网络。由于视频中的人类可能出现在不同的尺度并具有各种运动速度中,我们在每个单独的帧中应用用于2D关节或关键点预测的多尺度空间特征,以及多级时间卷积网络(TCNS)来估计3D关节或关键点。此外,我们基于身体结构以及肢体运动来设计一种时空鉴别器,以评估预测的姿势是否形成有效的姿势和有效运动。在培训期间,我们明确地掩盖了一些关键点来模拟各种遮挡案例,从次要到严重的遮挡,使我们的网络可以学到更好并变得鲁棒到各种遮挡程度。由于3D地面真理数据有限,我们进一步利用了2D视频数据来向我们的网络注入半监督的学习能力。公共数据集的实验验证了我们方法的有效性,我们的消融研究显示了我们网络的单个子模块的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号