首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Fully Convolutional Adaptation Networks for Semantic Segmentation
【24h】

Fully Convolutional Adaptation Networks for Semantic Segmentation

机译:用于语义分割的全卷积自适应网络

获取原文

摘要

The recent advances in deep neural networks have convincingly demonstrated high capability in learning vision models on large datasets. Nevertheless, collecting expert labeled datasets especially with pixel-level annotations is an extremely expensive process. An appealing alternative is to render synthetic data (e.g., computer games) and generate ground truth automatically. However, simply applying the models learnt on synthetic images may lead to high generalization error on real images due to domain shift. In this paper, we facilitate this issue from the perspectives of both visual appearance-level and representation-level domain adaptation. The former adapts source-domain images to appear as if drawn from the 'style' in the target domain and the latter attempts to learn domain-invariant representations. Specifically, we present Fully Convolutional Adaptation Networks (FCAN), a novel deep architecture for semantic segmentation which combines Appearance Adaptation Networks (AAN) and Representation Adaptation Networks (RAN). AAN learns a transformation from one domain to the other in the pixel space and RAN is optimized in an adversarial learning manner to maximally fool the domain discriminator with the learnt source and target representations. Extensive experiments are conducted on the transfer from GTA5 (game videos) to Cityscapes (urban street scenes) on semantic segmentation and our proposal achieves superior results when comparing to state-of-the-art unsupervised adaptation techniques. More remarkably, we obtain a new record: mIoU of 47.5% on BDDS (drive-cam videos) in an unsupervised setting.
机译:深度神经网络的最新进展令人信服地证明了在大型数据集上学习视觉模型的高能力。但是,收集专家标记的数据集,尤其是使用像素级注释的数据集,是非常昂贵的过程。一个吸引人的选择是渲染合成数据(例如,计算机游戏)并自动生成基本事实。但是,由于域偏移,仅将在合成图像上学习的模型应用于实际图像可能会导致在真实图像上出现很高的泛化误差。在本文中,我们从视觉外观级别和表示级别的域自适应的角度来促进这一问题的解决。前者将源域图像改编为看起来像是从目标域中的“样式”绘制的,而后者则尝试学习域不变表示。具体来说,我们提出了完全卷积自适应网络(FCAN),这是一种新颖的语义分割深层架构,它结合了外观自适应网络(AAN)和表示自适应网络(RAN)。 AAN在像素空间中学习从一个域到另一个域的转换,并且以对抗性学习方式对RAN进行优化,以最大程度地使域识别符与学习到的源表示和目标表示相混淆。在从GTA5(游戏视频)到Cityscapes(城市街道场景)的语义分割上进行了广泛的实验,与最先进的无监督自适应技术相比,我们的建议取得了更好的结果。更值得注意的是,我们获得了新的记录:在无人监督的情况下,BDDS(驱动器摄像头视频)的mIoU为47.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号