首页> 外文会议>IEEE International Conference on Robotics and Automation >Fusing LIDAR and images for pedestrian detection using convolutional neural networks
【24h】

Fusing LIDAR and images for pedestrian detection using convolutional neural networks

机译:使用卷积神经网络融合LIDAR和图像以进行行人检测

获取原文

摘要

In this paper, we explore various aspects of fusing LIDAR and color imagery for pedestrian detection in the context of convolutional neural networks (CNNs), which have recently become state-of-art for many vision problems. We incorporate LIDAR by up-sampling the point cloud to a dense depth map and then extracting three features representing different aspects of the 3D scene. We then use those features as extra image channels. Specifically, we leverage recent work on HHA [9] (horizontal disparity, height above ground, and angle) representations, adapting the code to work on up-sampled LIDAR rather than Microsoft Kinect depth maps. We show, for the first time, that such a representation is applicable to up-sampled LIDAR data, despite its sparsity. Since CNNs learn a deep hierarchy of feature representations, we then explore the question: At what level of representation should we fuse this additional information with the original RGB image channels? We use the KITTI pedestrian detection dataset for our exploration. We first replicate the finding that region-CNNs (R-CNNs) [8] can outperform the original proposal mechanism using only RGB images, but only if fine-tuning is employed. Then, we show that: 1) using HHA features and RGB images performs better than RGB-only, even without any fine-tuning using large RGB web data, 2) fusing RGB and HHA achieves the strongest results if done late, but, under a parameter or computational budget, is best done at the early to middle layers of the hierarchical representation, which tend to represent midlevel features rather than low (e.g. edges) or high (e.g. object class decision) level features, 3) some of the less successful methods have the most parameters, indicating that increased classification accuracy is not simply a function of increased capacity in the neural network.
机译:在本文中,我们探索了在卷积神经网络(CNN)的背景下融合LIDAR和彩色图像进行行人检测的各个方面,而卷积神经网络(CNN)最近已成为许多视觉问题的最新技术。通过将点云上采样到密集的深度图,然后提取代表3D场景不同方面的三个特征,我们将LIDAR纳入其中。然后,我们将这些功能用作额外的图像通道。具体来说,我们利用有关HHA [9](水平视差,地面以上高度和角度)表示的最新工作,使代码适应于上采样的LIDAR而不是Microsoft Kinect深度图。我们首次表明,尽管这种表示是稀疏的,但它仍然适用于上采样的LIDAR数据。由于CNN学习了深度的特征表示层次结构,因此我们将探讨以下问题:我们应在附加表示的哪个级别将这些附加信息与原始RGB图像通道融合?我们使用KITTI行人检测数据集进行探索。我们首先复制发现,即区域CNN(R-CNN)[8]仅使用RGB图像即可胜过原始建议机制,但前提是必须采用微调。然后,我们证明:1)即使没有使用大型RGB Web数据进行任何微调,使用HHA功能和RGB图像的效果也比仅使用RGB更好,2)如果将RGB和HHA融合在一起,则进行到较晚的操作可获得最强的效果,参数或计算预算最好在层次表示的早期到中间层完成,它们倾向于表示中级特征,而不是低级(例如边缘)或高级(例如对象类决策)级特征,3)少一些成功的方法具有最多的参数,这表明增加的分类准确度不仅仅是神经网络中增加的容量的函数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号