首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects
【24h】

KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects

机译:KeyPose:透明对象的多视图3D标注和关键点估计

获取原文

摘要

Estimating the 3D pose of desktop objects is crucial for applications such as robotic manipulation. Many existing approaches to this problem require a depth map of the object for both training and prediction, which restricts them to opaque, lambertian objects that produce good returns in an RGBD sensor. In this paper we forgo using a depth sensor in favor of raw stereo input. We address two problems: first, we establish an easy method for capturing and labeling 3D keypoints on desktop objects with an RGB camera; and second, we develop a deep neural network, called KeyPose, that learns to accurately predict object poses using 3D keypoints, from stereo input, and works even for transparent objects. To evaluate the performance of our method, we create a dataset of 15 clear objects in five classes, with 48K 3D-keypoint labeled images. We train both instance and category models, and show generalization to new textures, poses, and objects. KeyPose surpasses state-of-the-art performance in 3D pose estimation on this dataset by factors of 1.5 to 3.5, even in cases where the competing method is provided with ground-truth depth. Stereo input is essential for this performance as it improves results compared to using monocular input by a factor of 2. We will release a public version of the data capture and labeling pipeline, the transparent object database, and the KeyPose models and evaluation code. Project website: https://sites.google.com/corp/view/keypose.
机译:估计桌面对象的3D姿势对于诸如机器人操纵之类的应用而言至关重要。解决该问题的许多现有方法都需要对象的深度图以进行训练和预测,这将它们限制为在RGBD传感器中产生良好回报的不透明朗伯对象。在本文中,我们放弃了使用深度传感器来支持原始立体声输入。我们解决了两个问题:首先,我们建立了一种简单的方法来使用RGB相机捕获和标记桌面对象上的3D关键点;其次,我们开发了一个称为KeyPose的深度神经网络,该网络可以从立体声输入中学习使用3D关键点准确预测对象的姿势,甚至可以用于透明对象。为了评估我们方法的性能,我们创建了一个包含五个类别的15个清晰对象的数据集,并带有48K 3D关键点标记的图像。我们训练实例模型和类别模型,并展示对新纹理,姿势和对象的概括。即使在竞争方法具有真实深度的情况下,KeyPose在此数据集上进行3D姿态估计时,也能以1.5到3.5倍的性能超越最新技术。立体声输入对于此性能至关重要,因为与使用单眼输入相比,它可以提高结果2倍。我们将发布数据捕获和标记管道,透明对象数据库以及KeyPose模型和评估代码的公共版本。项目网站:https://sites.google.com/corp/view/keypose。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号