首页> 外文期刊>Expert systems with applications >Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset
【24h】

Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset

机译:深度单眼深度估计利用大型室外立体声数据集

获取原文
获取原文并翻译 | 示例

摘要

Current self-supervised methods for monocular depth estimation are largely based on deeply nested convolutional networks that leverage stereo image pairs or monocular sequences during the training phase. However, they often exhibit inaccurate results around occluded regions and depth boundaries. In this paper, we present a simple yet effective approach for monocular depth estimation using stereo image pairs. The study aims to propose a student-teacher strategy in which a shallow student network is trained with the auxiliary information obtained from a deeper and more accurate teacher network. Specifically, we first train the stereo teacher network by fully utilizing the binocular perception of 3-D geometry, and then use the depth predictions of the teacher network to train the student network for monocular depth inference. This enables us to exploit all available depth data from massive unlabeled stereo pairs. We propose a strategy that involves the use of a data ensemble to merge the multiple depth predictions of the teacher network to improve the training samples by collecting nontrivial knowledge beyond a single prediction. To refine the inaccurate depth estimation that is used when training the student network, we further propose stereo confidence guided regression loss that handles the unreliable pseudo depth values in occlusion, texture-less region, and repetitive pattern. To complement the existing dataset comprising outdoor driving scenes, we built a novel large-scale dataset consisting of one million outdoor stereo images taken using hand-held stereo cameras. Finally, we demonstrate that the monocular depth estimation network provides feature representations that are suitable for high-level vision tasks. The experimental results for various outdoor scenarios demonstrate the effectiveness and flexibility of our approach, which outperforms state-of-the-art approaches.
机译:目前用于单眼深度估计的自我监督方法主要基于深度嵌套的卷积网络,在训练阶段期间利用立体图像对或单眼序列。然而,它们通常会呈现出遮挡区域和深度边界的不准确结果。在本文中,我们介绍了一种使用立体图像对的单眼深度估计的简单有效方法。该研究旨在提出一项学生 - 教师策略,其中浅学生网络接受了从更深层次更准确的教师网络获得的辅助信息培训。具体而言,我们首先通过完全利用3-D几何的双目感知来训练立体声教师网络,然后使用教师网络的深度预测训练学生网络进行单眼深度推断。这使我们能够利用来自大规模未标记的立体对的所有可用深度数据。我们提出了一种涉及使用数据集合来利用教师网络的多深度预测来利用教师网络的多深度预测来通过收集超出单个预测的非竞争知识来改善训练样本。为了优化培训学生网络时使用的不准确的深度估计,我们进一步提出了立体声信心引导回归损失,处理遮挡,纹理区域和重复模式中的不可靠的伪深度值。要补充包括户外驾驶场景的现有数据集,我们建立了一部小型大规模数据集,包括使用手持立体声相机拍摄的一百万个室外立体声图像。最后,我们证明单目深估计网络提供适合高级视觉任务的特征表示。各种户外情景的实验结果表明了我们的方法的有效性和灵活性,这优于最先进的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号