首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Learning Monocular 3D Human Pose Estimation from Multi-view Images
【24h】

Learning Monocular 3D Human Pose Estimation from Multi-view Images

机译:从多视图图像中学习单眼3D人的姿势估计

获取原文

摘要

Accurate 3D human pose estimation from single images is possible with sophisticated deep-net architectures that have been trained on very large datasets. However, this still leaves open the problem of capturing motions for which no such database exists. Manual annotation is tedious, slow, and error-prone. In this paper, we propose to replace most of the annotations by the use of multiple views, at training time only. Specifically, we train the system to predict the same pose in all views. Such a consistency constraint is necessary but not sufficient to predict accurate poses. We therefore complement it with a supervised loss aiming to predict the correct pose in a small set of labeled images, and with a regularization term that penalizes drift from initial predictions. Furthermore, we propose a method to estimate camera pose jointly with human pose, which lets us utilize multiview footage where calibration is difficult, e.g., for pan-tilt or moving handheld cameras. We demonstrate the effectiveness of our approach on established benchmarks, as well as on a new Ski dataset with rotating cameras and expert ski motion, for which annotations are truly hard to obtain.
机译:使用已经在非常大的数据集上进行训练的复杂的深网架构,可以从单个图像进行准确的3D人体姿势估计。但是,这仍然留下了捕获运动的问题,而这些运动不存在这样的数据库。手动注释繁琐,缓慢且容易出错。在本文中,我们建议仅在培训时使用多个视图来替换大多数注释。具体来说,我们训练系统以预测所有视图中的相同姿势。这样的一致性约束是必需的,但不足以预测准确的姿势。因此,我们用监督损失来弥补它,该损失旨在在少量标记图像中预测正确的姿势,并用正则化项对初始预测的漂移进行惩罚。此外,我们提出了一种与人的姿势一起估计摄像机姿势的方法,该方法使我们可以利用难以校准的多视图素材,例如用于水平倾斜或移动手持摄像机。我们在既定的基准上以及在带有旋转摄像机和专家滑雪运动的新Ski数据集上证明了我们方法的有效性,而对于这些数据而言,真正难以获得注释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号