Learning Affinity to Parse Images

机译：学习亲和力解析图像

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent years have witnessed the success of deep learning models such as convolutional neural networks (ConvNets) for numerous vision tasks. However, ConvNets have a significant limitation: they do not have effective internal structures to explicitly learn image pairwise relations. This yields two fundamental bottlenecks for many vision problems of label and map regression, as well as image reconstruction: (a) pixels of an image have large amount of redundancies but cannot be efficiently utilized by ConvNets, which predict each of them independently, and (b) the convolutional operation cannot effectively solve problems that rely on similarities of pixel pairs, e.g., image pixel propagation and shape/mask refinement.;This thesis focuses on how to learn pairwise relations of image pixels under jointly, end-to-end learnable neural networks. Specifically, this is achieved by two different approaches: (a) formulating the conditional random field (CRF) objective as a non-structured objective that can be implemented via ConvNets as an additional loss, and (b) developing spatial propagation based deep-learning-friendly structures that learn the pairwise relations in an explicit manner.;In the first approach, we develop a novel multi-objective learning method that optimizes a single unified deep convolutional network with two distinct non-structured loss functions: one encoding the unary label likelihoods and the other encoding the pairwise label dependencies. We propose to apply this framework on face parsing, while experiments on both LFW and Helen datasets demonstrate the additional pairwise loss significantly improves the labeling performance compared to a single loss ConvNet with the same architecture.;In the second approach, we explore how to learn pairwise relations using spatial propagation networks, instead of using additional loss functions. Unlike ConvNets, the propagation module is a spatially recurrent network with a linear transformation between adjacent rows and columns. We propose two typical structures: a one-way connection using one-dimensional propagation, and a three-way connection using two-dimensional propagation. For both models, the linear weights are spatially variant output maps that can be learned from any ConvNet. Since such modules are fully differentiable, they are flexible enough to be inserted into any type of neural network. We prove that while both structures can formulate global affinities, the one-way connection constructs a sparse matrix, and the three-way forms a much denser one. While both structures demonstrate their effectiveness over a wide range of vision problems, the three-way connection is more powerful with challenging tasks (e.g., general object segmentation). We show that a well-learned affinity can benefit numerous computer vision applications, including but not limited to image filtering and denoising, pixel/color interpolation, face parsing, as well as general semantic segmentation. Compared to graphical model base pairwise learning, the spatial propagation network can be a good alternative in deep-learning based frameworks.

机译：近年来，目睹了深度学习模型的成功，例如用于众多视觉任务的卷积神经网络（ConvNets）。但是，ConvNet有一个很大的局限性：它们没有有效的内部结构来显式学习图像成对关系。这为标签和地图回归以及图像重建的许多视觉问题产生了两个基本瓶颈：（a）图像像素具有大量冗余，但不能被ConvNets有效利用，ConvNets可以独立预测每个像素，并且（ b）卷积运算不能有效地解决依赖于像素对相似性的问题，例如图像像素的传播和形状/掩模的细化。;本文重点研究如何在端到端可学习的联合条件下学习图像像素的成对关系神经网络。具体来说，这是通过两种不同的方法来实现的：（a）将条件随机场（CRF）目标表述为可以通过ConvNets作为附加损失实现的非结构化目标，以及（b）开发基于空间传播的深度学习友好的结构，以显式方式学习成对关系。；在第一种方法中，我们开发了一种新颖的多目标学习方法，该方法优化了具有两个不同的非结构化损失函数的单个统一深度卷积网络：一个编码一元标签可能性和其他编码成对标签依赖关系。我们建议将此框架应用于人脸解析，而在LFW和Helen数据集上的实验表明，与具有相同体系结构的单个损失ConvNet相比，额外的成对损失显着提高了标记性能。在第二种方法中，我们探索了如何学习使用空间传播网络的成对关系，而不是使用其他损失函数。与ConvNets不同，传播模块是空间递归网络，在相邻行和列之间具有线性变换。我们提出两种典型的结构：使用一维传播的单向连接和使用二维传播的三向连接。对于这两种模型，线性权重都是可以从任何ConvNet获悉的空间变异输出图。由于此类模块是完全可区分的，因此它们足够灵活，可以插入任何类型的神经网络中。我们证明，虽然这两种结构都可以表示全局亲和力，但单向连接构造了一个稀疏矩阵，而三向连接则形成了一个更密集的矩阵。虽然这两种结构都可在各种视觉问题上证明其有效性，但三向连接在执行具有挑战性的任务（例如，常规对象分割）时功能更强大。我们表明，一个学识渊博的亲和力可以使许多计算机视觉应用受益，包括但不限于图像过滤和去噪，像素/颜色插值，人脸解析以及一般的语义分割。与基于图形模型的成对学习相比，空间传播网络可以成为基于深度学习的框架的不错选择。

著录项

作者
Liu, Sifei.;
展开▼
作者单位

University of California, Merced.;

展开▼
授予单位 University of California, Merced.;
学科 Computer science.;Electrical engineering.;Artificial intelligence.
学位 Ph.D.
年度 2017
页码 143 p.
总页数 143
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Parsing very high resolution urban scene images by learning deep ConvNets with edge-aware loss [J] . Zheng Xianwei, Huan Linxi, Xia Gui-Song, ISPRS Journal of Photogrammetry and Remote Sensing . 2020,第Deca期

机译：通过学习具有边缘感知损失的深探伤来解析非常高分辨率的城市场景图像
2. Parsing human image by fusing semantic and spatial features: A deep learning approach [J] . Ruilin Zhao, Yanbing Xue, Jing Cai, Information Processing & Management . 2020,第6期

机译：通过融合语义和空间特征来解析人类图像：深入学习方法
3. Hierarchical Scene Parsing by Weakly Supervised Learning with Image Descriptions [J] . Zhang Ruimao, Lin Liang, Wang Guangrun, IEEE Transactions on Pattern Analysis and Machine Intelligence . 2019,第3期

机译：具有图像描述的弱监督学习的层次场景解析
4. CLEVR Parser: A Graph Parser Library for Geometric Learning on Language Grounded Image Scenes [C] . Raeid Saqur, Ameet Deshpande Workshop for NLP Open Source Software . 2020

机译：CLEVR解析器：语言接地图像场景的几何学习的图形解析器库
5. Towards Intelligent Diagnosis: Deep Learning for Biomarker Detection and Semantic Parsing from Medical Images [D] . Chen, Hao. 2017

机译：迈向智能诊断：从医学图像进行生物标记检测和语义解析的深度学习
6. Parsing clinical text using the state-of-the-art deep learning based parsers: a systematic comparison [O] . Yaoyun Zhang, Firat Tiryaki, Min Jiang, 2019

机译：使用基于深度学习的最新解析器解析临床文本：系统比较
7. Learning to Parse Wireframes in Images of Man-Made Environments [O] . Kun Huang, Yifan Wang, Zihan Zhou, 2018

机译：学习在人造环境的图像中解析线框

Learning Affinity to Parse Images

摘要

著录项

相似文献

相关主题

期刊订阅