Deep Structured Scene Parsing by Learning with Image Descriptions

机译：通过使用图像描述学习深层结构化场景解析

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper addresses a fundamental problem of scene understanding: How to parse the scene image into a structured configuration (i.e., a semantic object hierarchy with object interaction relations) that finely accords with human perception. We propose a deep architecture consisting of two networks: i) a convolutional neural network (CNN) extracting the image representation for pixelwise object labeling and ii) a recursive neural network (RNN) discovering the hierarchical object structure and the inter-object relations. Rather than relying on elaborative user annotations (e.g., manually labeling semantic maps and relations), we train our deep model in a weakly-supervised manner by leveraging the descriptive sentences of the training images. Specifically, we decompose each sentence into a semantic tree consisting of nouns and verb phrases, and facilitate these trees discovering the configurations of the training images. Once these scene configurations are determined, then the parameters of both the CNN and RNN are updated accordingly by back propagation. The entire model training is accomplished through an Expectation-Maximization method. Extensive experiments suggest that our model is capable of producing meaningful and structured scene configurations and achieving more favorable scene labeling performance on PASCAL VOC 2012 over other state-of-the-art weakly-supervised methods.

机译：本文讨论了场景理解的根本问题：如何将场景图像解析为结构化配置（即，具有对象交互关系的语义对象层次结构），其精细地符合人类感知。我们提出了一个由两个网络组成的深度架构：i）卷积神经网络（CNN）提取PixelWient对象标签的图像表示和II）递归神经网络（RNN），其发现分层对象结构和对象间关系。通过利用培训图像的描述性句子，我们通过利用训练图像的描述性句子来训练我们的深层模型而不是依赖于精细的用户注释（例如，手动标记语义地图和关系）而不是依赖于阐述的用户注释。具体地，我们将每个句子分解为由名词和动词短语组成的语义树，并促进这些树发现训练图像的配置。一旦确定了这些场景配置，那么通过后传播相应地更新CNN和RNN的参数。通过期望最大化方法完成整个模型培训。广泛的实验表明，我们的模型能够在其他最先进的虚弱的方法上生产有意义和结构化的场景配置，并在Pascal VOC上实现更有利的场景标记性能。

著录项

来源
《IEEE Conference on Computer Vision and Pattern Recognition》|2016年|1 v.|共9页
会议地点
作者
Liang Lin; Guangrun Wang; Rui Zhang; Ruimao Zhang; Xiaodan Liang; Wangmeng Zuo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41;
关键词
Semantics; Labeling; Recurrent neural networks; Training; Merging; Computer vision;

机译：语义;标签;经常性神经网络;培训;合并;计算机愿景;

相似文献

外文文献
中文文献
专利

1. Parsing very high resolution urban scene images by learning deep ConvNets with edge-aware loss [J] . Zheng Xianwei, Huan Linxi, Xia Gui-Song, ISPRS Journal of Photogrammetry and Remote Sensing . 2020,第Deca期

机译：通过学习具有边缘感知损失的深探伤来解析非常高分辨率的城市场景图像
2. Hierarchical Scene Parsing by Weakly Supervised Learning with Image Descriptions [J] . Zhang Ruimao, Lin Liang, Wang Guangrun, IEEE Transactions on Pattern Analysis and Machine Intelligence . 2019,第3期

机译：具有图像描述的弱监督学习的层次场景解析
3. ResFusion: deeply fused scene parsing network for RGB-D images [J] . Juting Dai, Xinyi Tang Computer Vision, IET . 2018,第8期

机译：ResFusion：RGB-D图像的深度融合场景解析网络
4. Deep Structured Scene Parsing by Learning with Image Descriptions [C] . Liang Lin, Guangrun Wang, Rui Zhang, IEEE Conference on Computer Vision and Pattern Recognition . 2016

机译：通过学习图像描述进行深度结构化场景解析
5. Towards Intelligent Diagnosis: Deep Learning for Biomarker Detection and Semantic Parsing from Medical Images [D] . Chen, Hao. 2017

机译：迈向智能诊断：从医学图像进行生物标记检测和语义解析的深度学习
6. Parsing clinical text using the state-of-the-art deep learning based parsers: a systematic comparison [O] . Yaoyun Zhang, Firat Tiryaki, Min Jiang, 2019

机译：使用基于深度学习的最新解析器解析临床文本：系统比较
7. Deep Structured Scene Parsing by Learning with Image Descriptions [O] . Lin, Liang, Wang, Guangrun, Zhang, Rui, 2016

机译：用图像描述学习深度结构化场景解析

Deep Structured Scene Parsing by Learning with Image Descriptions

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅