首页> 外文会议>International Conference on Image and Signal Processing >Multistage Deep Neural Network Framework for People Detection and Localization Using Fusion of Visible and Thermal Images
【24h】

Multistage Deep Neural Network Framework for People Detection and Localization Using Fusion of Visible and Thermal Images

机译:利用可见光图像和热图像融合进行人员检测和定位的多级深度神经网络框架

获取原文

摘要

In Computer vision object detection and classification are active fields of research. Applications of object detection and classification include a diverse range of fields such as surveillance, autonomous cars and robotic vision. Many intelligent systems are built by researchers to achieve the accuracy of human perception but could not quite achieve it yet. Convolutional Neural Networks (CNN) and Deep Learning architectures are used to achieve human like perception for object detection and scene identification. We are proposing a novel method by combining previously used techniques. We are proposing a model which takes multi-spectral images, fuses them together, drops the useless images and then provides semantic segmentation for each object (person) present in the image. In our proposed methodology we are using CNN for fusion of Visible and thermal images and Deep Learning architectures for classification and localization. Fusion of visible and thermal images is carried out to combine informative features of both images into one image. For fusion we are using Encoder-decoder architecture. Fused image is then fed into Resnet-152 architecture for classification of images. Images obtained from Resnet-152 are then fed into Mask-RCNN for localization of persons. Mask-RCNN uses Resnet-101 architecture for localization of objects. From the results it. can be clearly seen that Fused model for object localization outperforms the Visible model and gives promising results for person detection for surveillance purposes. Our proposed model gives the Miss Rate of 5.25% which is much better than the previous state of the art method applied on KAIST dataset.
机译:在计算机视觉中,对象检测和分类是活跃的研究领域。对象检测和分类的应用包括监视,自动驾驶汽车和机器人视觉等多个领域。研究人员构建了许多智能系统来实现人类感知的准确性,但还不能完全实现。卷积神经网络(CNN)和深度学习架构用于实现类似于人的感知以进行对象检测和场景识别。通过结合以前使用的技术,我们提出了一种新颖的方法。我们正在提出一个模型,该模型可以拍摄多光谱图像,将它们融合在一起,丢弃无用的图像,然后为图像中存在的每个对象(人)提供语义分割。在我们提出的方法中,我们使用CNN融合可见光图像和热图像,并使用深度学习架构进行分类和本地化。进行可见光图像和热图像的融合,以将两个图像的信息特征组合为一个图像。为了融合,我们使用编码器-解码器体系结构。然后将融合的图像馈入Resnet-152架构中以对图像进行分类。然后,将从Resnet-152获得的图像输入到Mask-RCNN中以对人员进行定位。 Mask-RCNN使用Resnet-101架构对对象进行本地化。从结果来看。可以清楚地看到,用于对象定位的融合模型优于可见模型,并为监视目的的人检测提供了有希望的结果。我们提出的模型给出了5.25%的未命中率,这比应用于KAIST数据集的现有技术水平要好得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号