3D Human Pose Estimation Using Spatio-Temporal Networks with Explicit Occlusion Training

机译：3D人类姿势估计使用具有显式遮挡训练的时空网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Estimating 3D poses from a monocular video is still a challenging task, despite the significant progress that has been made in the recent years. Generally, the performance of existing methods drops when the target person is too small/large, or the motion is too fast/slow relative to the scale and speed of the training data. Moreover, to our knowledge, many of these methods are not designed or trained under severe occlusion explicitly, making their performance on handling occlusion compromised. Addressing these problems, we introduce a spatio-temporal network for robust 3D human pose estimation. As humans in videos may appear in different scales and have various motion speeds, we apply multi-scale spatial features for 2D joints or keypoints prediction in each individual frame, and multi-stride temporal convolutional networks (TCNs) to estimate 3D joints or keypoints. Furthermore, we design a spatio-temporal discriminator based on body structures as well as limb motions to assess whether the predicted pose forms a valid pose and a valid movement. During training, we explicitly mask out some keypoints to simulate various occlusion cases, from minor to severe occlusion, so that our network can learn better and becomes robust to various degrees of occlusion. As there are limited 3D ground truth data, we further utilize 2D video data to inject a semi-supervised learning capability to our network. Experiments on public data sets validate the effectiveness of our method, and our ablation studies show the strengths of our network's individual submodules.

机译：尽管近年来已经取得了重大进展，但估算来自单眼视频的3D姿势仍然是一个具有挑战性的任务。通常，当目标人太小/大时，现有方法的性能降低，或者相对于训练数据的比例和速度，运动太快/速度太快。此外，对于我们的知识，许多这些方法没有明确地在严重的遮挡下设计或培训，使其在处理遮挡受损的情况下的性能。解决这些问题，我们为强大的3D人类姿势估算介绍了一种时空网络。由于视频中的人类可能出现在不同的尺度并具有各种运动速度中，我们在每个单独的帧中应用用于2D关节或关键点预测的多尺度空间特征，以及多级时间卷积网络（TCNS）来估计3D关节或关键点。此外，我们基于身体结构以及肢体运动来设计一种时空鉴别器，以评估预测的姿势是否形成有效的姿势和有效运动。在培训期间，我们明确地掩盖了一些关键点来模拟各种遮挡案例，从次要到严重的遮挡，使我们的网络可以学到更好并变得鲁棒到各种遮挡程度。由于3D地面真理数据有限，我们进一步利用了2D视频数据来向我们的网络注入半监督的学习能力。公共数据集的实验验证了我们方法的有效性，我们的消融研究显示了我们网络的单个子模块的优势。

著录项

来源
《AAAI Conference on Artificial Intelligence》|2020年|10434-11100p|共8页
会议地点
作者
Yu Cheng; Bo Yang; Bo Wang; Robby T. Tan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. An Improved Method for Model-Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Occlusion Scenes [J] . Dewei Zou, Qi Cao, Zilong Zhuang, Procedia CIRP . 2019,第2期

机译：一种基于模型的遮挡场景中纹理较少的3D对象的训练，检测和姿势估计的改进方法
2. Bayesian capsule networks for 3D human pose estimation from single 2D images [J] . Ramirez Ivan, Cuesta-Infante Alfredo, Schiavi Emanuele, Neurocomputing . 2020,第Feba28期

机译：用于从单个2D图像进行3D人体姿势估计的贝叶斯胶囊网络
3. Multiview 3D human pose estimation using improved least-squares and LSTM networks [J] . Carlos Nunez Juan, Cabido Raid, Velez Jose F., Neurocomputing . 2019,第JANa5期

机译：使用改进的最小二乘法和LSTM网络进行多视图3D人体姿势估计
4. 3D Human Pose Estimation Using Spatio-Temporal Networks with Explicit Occlusion Training [C] . Yu Cheng, Bo Yang, Bo Wang, AAAI Conference on Artificial Intelligence . 2020

机译：3D人类姿势估计使用具有显式遮挡训练的时空网络
5. Model-based human pose estimation with spatio-temporal inferencing [D] . Zhu, Youding 2009

机译：时空推断的基于模型的人体姿态估计
6. Multi-Person Pose Estimation using an Orientation and Occlusion Aware Deep Learning Network [O] . Yanlei Gu, Huiyang Zhang, Shunsuke Kamijo 2020

机译：使用定向和遮挡感知深度学习网络的多人姿势估计
7. RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation [O] . Bastian Wandt, Bodo Rosenhahn 2019

机译：REBNET：用于3D人类姿态估算的对抗性重新注入网络的弱训练

3D Human Pose Estimation Using Spatio-Temporal Networks with Explicit Occlusion Training

摘要

著录项

相似文献

相关主题

期刊订阅