首页> 外文学位 >Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning
【24h】

Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning

机译:通过无监督和自监督的表示学习发现视觉语义

获取原文
获取原文并翻译 | 示例

摘要

The success of deep learning in computer vision is rooted in the ability of deep networks to scale up model complexity as demanded by challenging visual tasks. As complexity is increased, so is the demand for large amounts of labeled data to train the model. This is associated with a costly human annotation effort. Modern vision networks often rely on a two-stage training process to satisfy this thirst for training data: the first stage, pretraining, is done on a general vision task where a large collection of annotated data is available. This primes the network with semantic knowledge that is general to a wide variety of vision tasks. The second stage, fine-tuning, continues the training of the network, this time for the target task where annotations are often scarce. The reliance on supervised pretraining anchors future progress to a constant human annotation effort, especially for new or ever-changing domains. To address this concern, with the long-term goal of leveraging the abundance of cheap unlabeled data, we explore methods of unsupervised pretraining. In particular, we propose to use self-supervised automatic image colorization.;We begin by evaluating two baselines for leveraging unlabeled data for representation learning. One is based on training a mixture model for each layer in a greedy manner. We show that this method excels on relatively simple tasks in the small sample regime. It can also be used to produce a well-organized feature space that is equivariant to cyclic transformations, such as rotation. Second, we consider autoencoders, which are trained end-to-end and thus avoid the main concerns of greedy training. However, its per-pixel loss is not a good analog to perceptual similarity and the representation suffers as a consequence. Both of these methods leave a wide gap between unsupervised and supervised pretraining.;As a precursor to our improvements in unsupervised representation learning, we develop a novel method for automatic colorization of grayscale images and focus initially on its use as a graphics application. We set a new state-of-the-art that handles a wide variety of scenes and contexts. Our method makes it possible to revitalize old black-and-white photography, without requiring human effort or expertise. In order for the model to appropriately re-color a grayscale object, it must first be able to identify it. Since such high-level semantic knowledge benefits colorization, we found success employing the two-stage training process with supervised pretraining. This raises the question: If colorization and classification both benefit from the same visual semantics, can we reverse the relationship and use colorization to benefit classification?;Using colorization as a pretraining method does not require data annotations, since labeled training pairs are automatically constructed by separating intensity and color. The task is what is called self-supervised. Colorization joins a growing family of self-supervision methods as a front-runner with state-of-the-art results. We show that up to a certain sample size, labeled data can be entirely replaced by a large collection of unlabeled data. If these techniques continue to improve, they may one day supplant supervised pretraining altogether. We provide a significant step toward this goal.;As a future direction for self-supervision, we investigate if multiple proxy tasks can be combined to improve generalization in the representation. A wide range of combination methods is explored, both offline methods that fuse or distill already trained networks, and online methods that actively train multiple tasks together. On controlled experiments, we demonstrate significant gains using both offline and online methods. However, the benefits do not translate to self-supervision pretraining, leaving the question of multi-proxy self-supervision an open and interesting problem.
机译:深度学习在计算机视觉中的成功源于深度网络扩展具有挑战性视觉任务要求的模型复杂性的能力。随着复杂度的增加,对大量标记数据进行模型训练的需求也在增加。这与昂贵的人工注释工作有关。现代视觉网络通常依靠两阶段的培训过程来满足对数据的这种渴求:第一阶段,即预培训,是在一般的视觉任务上完成的,该任务具有大量的注释数据。这为网络提供了各种视觉任务通用的语义知识。第二阶段,微调,继续网络的训练,这次是针对通常缺少注释的目标任务。对有监督的预训练的依赖将未来的发展锚定在不断的人工标注工作上,尤其是对于新的或不断变化的领域。为了解决这个问题,我们的长期目标是利用大量廉价的未标记数据,我们探索了无监督预训练的方法。特别是,我们建议使用自我监督的自动图像着色。我们首先评估两个基准,以利用未标记的数据进行表示学习。一个是基于以贪婪的方式为每一层训练一个混合模型。我们表明,该方法在小样本方案中相对简单的任务上表现出色。它也可以用于产生组织良好的特征空间,该特征空间与诸如旋转之类的循环变换等价。其次,我们考虑对自动编码器进行端到端训练,从而避免了贪婪训练的主要顾虑。然而,它的每像素损失不是感知相似性的良好模拟,因此表示遭受损失。这两种方法在无监督和有监督的预训练之间都留下了很大的差距。作为无监督表示学习改进的先驱,我们开发了一种自动对灰度图像进行自动着色的方法,并最初将其重点放在图形应用上。我们设置了一个新的最新技术,可以处理各种场景和上下文。我们的方法可以振兴旧的黑白摄影,而无需人工或专业知识。为了使模型适当地重新着色灰度对象,它必须首先能够识别它。由于这种高级语义知识有益于着色,因此我们发现在有监督的预训练的情况下采用两阶段训练过程是成功的。这就提出了一个问题:如果着色和分类都受益于相同的视觉语义,我们可以颠倒这种关系并使用着色来使分类受益吗?;将着色用作预训练方法不需要数据注释,因为标记的训练对是通过自动构建的分离强度和颜色。任务是所谓的自我监督。彩色化作为领先者加入了越来越多的自我监督方法,并获得了最新的结果。我们表明,在不超过一定样本量的情况下,标记的数据可以被大量未标记的数据完全替代。如果这些技术继续改进,则有一天它们可能会完全取代受监督的预培训。我们朝着这个目标迈出了重要的一步。作为自我监督的未来方向,我们研究了是否可以组合多个代理任务以提高表示的通用性。探索了多种组合方法,包括融合或提取已训练网络的离线方法,以及主动地将多个任务一起训练的在线方法。在受控实验中,我们展示了使用离线和在线方法所获得的重大收益。但是,这些好处并不能转化为自我监督的预培训,从而使多代理自我监督的问题成为一个开放而有趣的问题。

著录项

  • 作者

    Larsson, Gustav Martin.;

  • 作者单位

    The University of Chicago.;

  • 授予单位 The University of Chicago.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 131 p.
  • 总页数 131
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 宗教;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号