首页> 外文期刊>Image and Vision Computing >Efficient pyramid context encoding and feature embedding for semantic segmentation
【24h】

Efficient pyramid context encoding and feature embedding for semantic segmentation

机译:有效的金字塔上下文编码和嵌入语义分割的功能

获取原文
获取原文并翻译 | 示例
           

摘要

For reality applications of semantic segmentation, inference speed and memory usage are two important factors. To address these challenges, we propose a lightweight feature pyramid encoding network (FPENet) for semantic segmentation with a good trade-off between accuracy and speed. We use a series of feature pyramid encoding (FPE) blocks to encode context at multiple scales in the encoder. Each FPE block consists of different depthwise dilated convolutions that perform as a spatial pyramid to extract features and reduce computational costs. During training, a one-shot neural architecture search algorithm is adopted to find the optimal structure for each FPE block from a large search space with a small search cost. After the search for the encoder, a mutual embedding upsample module is introduced in the decoder, consisting of two attention blocks. The encoder-decoder attention mechanism is used to help aggregate efficiently high-level semantic features and low-level spatial details. The proposed network outperforms the existing real-time methods with fewer parameters and improved inference speed on the Cityscapes and CamVid benchmark datasets. Specifically, it achieved 72.3% mean IoU on the Cityscapes test set with only 0.4 M parameters and 192.6 FPS speed on an Nvidia Titan V100 GPU, and 73.4% mean IoU with 116.2 FPS when running on higher resolution images.(c) 2021 Elsevier B.V. All rights reserved.
机译:对于语义分割的现实应用,推理速度和内存使用是两个重要因素。为了解决这些挑战,我们提出了一种轻量级特征金字塔编码网络(FPENET),用于语义分割,在精度和速度之间具有良好的权衡。我们使用一系列特征金字塔编码(FPE)块以编码编码器中的多个尺度的上下文。每个FPE块包括不同的深度扩张卷积,该卷积作为空间金字塔,以提取特征并降低计算成本。在训练期间,采用单次神经结构搜索算法来找到具有小搜索成本的大型搜索空间的每个FPE块的最佳结构。在搜索编码器之后,在解码器中引入了一个相互嵌入的上置模块,由两个注意力块组成。编码器解码器注意机制用于帮助聚合有效的高电平语义特征和低级空间细节。所提出的网络优于现有的实时方法,参数较少,并在CityScapes和Camvid基准数据集上提高推广速度。具体而言,它达到了72.3%的意思是,在城市景观测试集上,只有0.4米的参数和192.6个FPS速度在NVIDIA Titan V100 GPU上的速度,73.4%,在更高分辨率图像上运行时,具有116.2 FP的平均值。(c)2021 Elsevier BV版权所有。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号