首页> 外文期刊>ACM transactions on multimedia computing communications and applications >Spatial Structure Preserving Feature Pyramid Network for Semantic Image Segmentation
【24h】

Spatial Structure Preserving Feature Pyramid Network for Semantic Image Segmentation

机译:保存空间结构特征金字塔网络用于语义图像分割

获取原文
获取原文并翻译 | 示例

摘要

Recently, progress on semantic image segmentation is substantial, benefiting from the rapid development of Convolutional Neural Networks. Semantic image segmentation approaches proposed lately have been mostly based on Fully convolutional Networks (FCNs). However, these FCN-based methods use large receptive fields and too many pooling layers to depict the discriminative semantic information of the images. Specifically, on one hand, convolutional kernel with large receptive field smooth the detailed edges, since too much contexture information is used to depict the "center pixel." However, the pooling layer increases the receptive field through zooming out the latest feature maps, which loses many detailed information of the image, especially in the deeper layers of the network. These operations often cause low spatial resolution inside deep layers, which leads to spatially fragmented prediction. To address this problem, we exploit the inherent multi-scale and pyramidal hierarchy of deep convolutional networks to extract the feature maps with different resolutions and take full advantages of these feature maps via a gradually stacked fusing way. Specifically, for two adjacent convolutional layers, we upsample the features from deeper layer with stride of 2 and then stack them on the features from shallower layer. Then, a convolutional layer with kernels of 1 ⅹ 1 is followed to fuse these stacked features. The fused feature preserves the spatial structure information of the image; meanwhile, it owns strong discriminative capability for pixel classification. Additionally, to further preserve the spatial structure information and regional connectivity of the predicted category label map, we propose a novel loss term for the network. In detail, two graph model-based spatial affinity matrixes are proposed, which are used to depict the pixel-level relationships in the input image and predicted category label map respectively, and then their cosine distance is backward propagated to the network. The proposed architecture, called spatial structure preserving feature pyramid network, significantly improves the spatial resolution of the predicted category label map for semantic image segmentation. The proposed method achieves state-of-the-art results on three public and challenging datasets for semantic image segmentation.
机译:最近,语义图像分割的进步是很大的,从卷积神经网络的快速发展中受益。最近提出的语义图像分割方法主要基于完全卷积网络(FCN)。然而,基于FCN的方法使用大的接收领域和太多的汇集层来描绘图像的辨别性语义信息。具体而言,一方面,具有大容器的卷积核,具有大的详细边缘,因为使用太多的上下文信息来描绘“中心像素”。然而,汇集层通过缩放最新的特征映射来增加接收领域,其丢失图像的许多详细信息,尤其是在网络的更深层中。这些操作经常导致深层内部的低空间分辨率,这导致空间碎片预测。为了解决这个问题,我们利用深度卷积网络的固有的多尺度和金字塔层次结构来提取具有不同分辨率的特征映射,并通过逐步堆叠的融合方式实现这些特征映射的全部优势。具体地,对于两个相邻的卷积层,我们将来自深层层的特征升高,然后将它们堆叠在较浅层的特征上。然后,遵循具有1÷1的核的卷积层,以熔化这些堆叠的特征。融合功能保留图像的空间结构信息;同时,它拥有对像素分类的强烈辨别能力。另外,为了进一步保留预测类别标签地图的空间结构信息和区域连接,我们提出了网络的新丢失项。详细地,提出了两个基于图形的空间亲和矩阵,其用于分别描绘输入图像和预测类别标签地图中的像素级关系,然后将其余弦距离向后传播到网络。所提出的架构,称为空间结构保留特征金字塔网络,显着提高了用于语义图像分割的预测类别标签映射的空间分辨率。所提出的方法实现了三个公共和具有挑战性的数据集的最先进的结果,用于语义图像分割。

著录项

  • 来源
  • 作者单位

    Center for OPTical Imagery Analysis and Learning (OPTIMAL) Northwestern Polytechnical University China;

    Key Laboratory of Spectral Imaging Technology CAS Xi'an Institute of Optics and Precision Mechanics Chinese Academy of Sciences China and University of Chinese Academy of Sciences China;

    Key Laboratory of Spectral Imaging Technology CAS Xi'an Institute of Optics and Precision Mechanics Chinese Academy of Sciences China and University of Chinese Academy of Sciences China;

    Key Laboratory of Spectral Imaging Technology CAS Xi'an Institute of Optics and Precision Mechanics Chinese Academy of Sciences China and University of Chinese Academy of Sciences China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Semantic image segmentation; spatial resolution; feature pyramid net-work; discriminative capability;

    机译:语义图像分割;空间分辨率;特征金字塔净工作;辨别能力;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号