首页> 外文学位 >Learning Affinity to Parse Images
【24h】

Learning Affinity to Parse Images

机译:学习亲和力解析图像

获取原文
获取原文并翻译 | 示例

摘要

Recent years have witnessed the success of deep learning models such as convolutional neural networks (ConvNets) for numerous vision tasks. However, ConvNets have a significant limitation: they do not have effective internal structures to explicitly learn image pairwise relations. This yields two fundamental bottlenecks for many vision problems of label and map regression, as well as image reconstruction: (a) pixels of an image have large amount of redundancies but cannot be efficiently utilized by ConvNets, which predict each of them independently, and (b) the convolutional operation cannot effectively solve problems that rely on similarities of pixel pairs, e.g., image pixel propagation and shape/mask refinement.;This thesis focuses on how to learn pairwise relations of image pixels under jointly, end-to-end learnable neural networks. Specifically, this is achieved by two different approaches: (a) formulating the conditional random field (CRF) objective as a non-structured objective that can be implemented via ConvNets as an additional loss, and (b) developing spatial propagation based deep-learning-friendly structures that learn the pairwise relations in an explicit manner.;In the first approach, we develop a novel multi-objective learning method that optimizes a single unified deep convolutional network with two distinct non-structured loss functions: one encoding the unary label likelihoods and the other encoding the pairwise label dependencies. We propose to apply this framework on face parsing, while experiments on both LFW and Helen datasets demonstrate the additional pairwise loss significantly improves the labeling performance compared to a single loss ConvNet with the same architecture.;In the second approach, we explore how to learn pairwise relations using spatial propagation networks, instead of using additional loss functions. Unlike ConvNets, the propagation module is a spatially recurrent network with a linear transformation between adjacent rows and columns. We propose two typical structures: a one-way connection using one-dimensional propagation, and a three-way connection using two-dimensional propagation. For both models, the linear weights are spatially variant output maps that can be learned from any ConvNet. Since such modules are fully differentiable, they are flexible enough to be inserted into any type of neural network. We prove that while both structures can formulate global affinities, the one-way connection constructs a sparse matrix, and the three-way forms a much denser one. While both structures demonstrate their effectiveness over a wide range of vision problems, the three-way connection is more powerful with challenging tasks (e.g., general object segmentation). We show that a well-learned affinity can benefit numerous computer vision applications, including but not limited to image filtering and denoising, pixel/color interpolation, face parsing, as well as general semantic segmentation. Compared to graphical model base pairwise learning, the spatial propagation network can be a good alternative in deep-learning based frameworks.
机译:近年来,目睹了深度学习模型的成功,例如用于众多视觉任务的卷积神经网络(ConvNets)。但是,ConvNet有一个很大的局限性:它们没有有效的内部结构来显式学习图像成对关系。这为标签和地图回归以及图像重建的许多视觉问题产生了两个基本瓶颈:(a)图像像素具有大量冗余,但不能被ConvNets有效利用,ConvNets可以独立预测每个像素,并且( b)卷积运算不能有效地解决依赖于像素对相似性的问题,例如图像像素的传播和形状/掩模的细化。;本文重点研究如何在端到端可学习的联合条件下学习图像像素的成对关系神经网络。具体来说,这是通过两种不同的方法来实现的:(a)将条件随机场(CRF)目标表述为可以通过ConvNets作为附加损失实现的非结构化目标,以及(b)开发基于空间传播的深度学习友好的结构,以显式方式学习成对关系。;在第一种方法中,我们开发了一种新颖的多目标学习方法,该方法优化了具有两个不同的非结构化损失函数的单个统一深度卷积网络:一个编码一元标签可能性和其他编码成对标签依赖关系。我们建议将此框架应用于人脸解析,而在LFW和Helen数据集上的实验表明,与具有相同体系结构的单个损失ConvNet相比,额外的成对损失显着提高了标记性能。在第二种方法中,我们探索了如何学习使用空间传播网络的成对关系,而不是使用其他损失函数。与ConvNets不同,传播模块是空间递归网络,在相邻行和列之间具有线性变换。我们提出两种典型的结构:使用一维传播的单向连接和使用二维传播的三向连接。对于这两种模型,线性权重都是可以从任何ConvNet获悉的空间变异输出图。由于此类模块是完全可区分的,因此它们足够灵活,可以插入任何类型的神经网络中。我们证明,虽然这两种结构都可以表示全局亲和力,但单向连接构造了一个稀疏矩阵,而三向连接则形成了一个更密集的矩阵。虽然这两种结构都可在各种视觉问题上证明其有效性,但三向连接在执行具有挑战性的任务(例如,常规对象分割)时功能更强大。我们表明,一个学识渊博的亲和力可以使许多计算机视觉应用受益,包括但不限于图像过滤和去噪,像素/颜色插值,人脸解析以及一般的语义分割。与基于图形模型的成对学习相比,空间传播网络可以成为基于深度学习的框架的不错选择。

著录项

  • 作者

    Liu, Sifei.;

  • 作者单位

    University of California, Merced.;

  • 授予单位 University of California, Merced.;
  • 学科 Computer science.;Electrical engineering.;Artificial intelligence.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 143 p.
  • 总页数 143
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号