首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Multi-level Fusion Based 3D Object Detection from Monocular Images
【24h】

Multi-level Fusion Based 3D Object Detection from Monocular Images

机译:基于多级融合的单眼图像3D目标检测

获取原文

摘要

In this paper, we present an end-to-end multi-level fusion based framework for 3D object detection from a single monocular image. The whole network is composed of two parts: one for 2D region proposal generation and another for simultaneously predictions of objects' 2D locations, orientations, dimensions, and 3D locations. With the help of a stand-alone module to estimate the disparity and compute the 3D point cloud, we introduce the multi-level fusion scheme. First, we encode the disparity information with a front view feature representation and fuse it with the RGB image to enhance the input. Second, features extracted from the original input and the point cloud are combined to boost the object detection. For 3D localization, we introduce an extra stream to predict the location information from point cloud directly and add it to the aforementioned location prediction. The proposed algorithm can directly output both 2D and 3D object detection results in an end-to-end fashion with only a single RGB image as the input. The experimental results on the challenging KITTI benchmark demonstrate that our algorithm significantly outperforms monocular state-of-the-art methods.
机译:在本文中,我们提出了一种基于端到端多级融合的框架,用于从单个单眼图像中检测3D对象。整个网络由两部分组成:一部分用于2D区域建议生成,另一部分用于同时预测对象的2D位置,方向,尺寸和3D位置。借助独立模块来估计视差并计算3D点云,我们引入了多级融合方案。首先,我们使用正视特征表示对视差信息进行编码,并将其与RGB图像融合以增强输入。其次,将从原始输入和点云中提取的特征进行组合以增强对象检测。对于3D定位,我们引入了一个额外的流以直接从点云预测位置信息,并将其添加到上述位置预测中。所提出的算法可以以仅单个RGB图像作为输入的端到端方式直接输出2D和3D对象检测结果。在具有挑战性的KITTI基准测试中的实验结果表明,我们的算法明显优于单眼最新技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号