首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
【24h】

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

机译:SegNet:用于图像分割的深度卷积编码器-解码器体系结构

获取原文
获取原文并翻译 | 示例

摘要

We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network [1] . The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN [2] and also with the well known DeepLab-LargeFOV [3] , DeconvNet [4] architectures. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing architectures and can be trained end-to-end using stochastic gradient descent. We also performed a controlled benchmark of SegNet and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. These quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures. We also provide a Caffe implementation of SegNet and a web demo at http://mi.eng.cam.ac.uk/projects/segnet/.
机译:我们提出了一种新颖实用的深度全卷积神经网络体系结构,用于语义像素分割,称为SegNet。这个核心的可训练分段引擎由一个编码器网络,一个相应的解码器网络以及一个逐像素分类层组成。编码器网络的架构在拓扑上与VGG16网络中的13个卷积层[1]相同。解码器网络的作用是将低分辨率编码器特征图映射到全输入分辨率特征图,以进行像素分类。 SegNet的新颖之处在于解码器对其较低分辨率的输入特征图进行升采样的方式。具体地,解码器使用在相应编码器的最大合并步骤中计算的合并索引来执行非线性上采样。这消除了学习上采样的需要。上采样的地图是稀疏的,然后与可训练的滤波器卷积以生成密集的特征图。我们将我们提出的架构与广泛采用的FCN [2]以及著名的DeepLab-LargeFOV [3],DeconvNet [4]架构进行了比较。这种比较揭示了实现良好的分割性能所涉及的内存与准确性之间的权衡。 SegNet的主要动力是场景理解应用程序。因此,它被设计为在推理期间的存储和计算时间方面都是高效的。与其他竞争架构相比,它的可训练参数数量也明显更少,并且可以使用随机梯度下降进行端到端训练。我们还在道路场景和SUN RGB-D室内场景分割任务上执行了SegNet和其他体系结构的受控基准测试。这些定量评估表明,与其他架构相比,SegNet在竞争性推理时间和最有效的推理存储方面提供了良好的性能。我们还在http://mi.eng.cam.ac.uk/projects/segnet/提供了SegNet的Caffe实现和Web演示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号