首页> 外文会议>European conference on computer vision >Generic 3D Representation via Pose Estimation and Matching
【24h】

Generic 3D Representation via Pose Estimation and Matching

机译:通过姿势估计和匹配进行通用3D表示

获取原文

摘要

Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited. In this paper, we learn a generic 3D representation through solving a set of foundational proxy 3D tasks: object-centric camera pose estimation and wide baseline feature matching. Our method is based upon the premise that by providing supervision over a set of carefully selected foundational tasks, generalization to novel tasks and abstraction capabilities can be achieved. We empirically show that the internal representation of a multi-task ConvNet trained to solve the above core problems generalizes to novel 3D tasks (e.g., scene layout estimation, object pose estimation, surface normal estimation) without the need for fine-tuning and shows traits of abstraction abilities (e.g., cross modality pose estimation). In the context of the core supervised tasks, we demonstrate our representation achieves state-of-the-art wide baseline feature matching results without requiring apriori rectification (unlike SIFT and the majority of learnt features). We also show 6DOF camera pose estimation given a pair local image patches. The accuracy of both supervised tasks come comparable to humans. Finally, we contribute a large-scale dataset composed of object-centric street view scenes along with point correspondences and camera pose information, and conclude with a discussion on the learned representation and open research questions.
机译:尽管大量的计算机视觉研究已经研究了开发通用语义表示形式,但是为3D开发类似表示形式的努力仍然有限。在本文中,我们通过解决一组基本的代理3D任务(以对象为中心的相机姿态估计和宽基线特征匹配)来学习通用的3D表示。我们的方法基于这样的前提:通过对一组精心选择的基础任务提供监督,可以实现对新颖任务和抽象功能的泛化。我们根据经验表明,经过训练可以解决上述核心问题的多任务ConvNet的内部表示形式可以推广到新颖的3D任务(例如,场景布局估计,对象姿态估计,表面法线估计),而无需进行微调并显示特征的抽象能力(例如,跨模态姿势估计)。在核心监督任务的背景下,我们证明了我们的表示可实现最新的宽基线特征匹配结果,而无需先验矫正(与SIFT和大多数学习的特征不同)。我们还展示了给定一对局部图像补丁的6DOF相机姿态估计。两种监督任务的准确性都可以与人类媲美。最后,我们贡献了一个由以对象为中心的街景场景以及点对应关系和相机姿态信息组成的大规模数据集,并以关于学习的表示形式和开放研究问题的讨论作为结束。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号