首页> 外文学位 >Domain Adaptation and Privileged Information for Visual Recognition
【24h】

Domain Adaptation and Privileged Information for Visual Recognition

机译:用于视觉识别的域自适应和特权信息

获取原文
获取原文并翻译 | 示例

摘要

The automatic identification of entities like objects, people or their actions in visual data, such as images or video, has significantly improved, and is now being deployed in access control, social media, online retail, autonomous vehicles, and several other applications. This visual recognition capability leverages supervised learning techniques, which require large amounts of labeled training data from the target distribution representative of the particular task at hand. However, collecting such training data might be expensive, require too much time, or even be impossible. In this work, we introduce several novel approaches aiming at compensating for the lack of target training data. Rather than leveraging prior knowledge for building task-specific models, typically easier to train, we focus on developing general visual recognition techniques, where the notion of prior knowledge is better identified by additional information, available during training. Depending on the nature of such information, the learning problem may turn into domain adaptation (DA), domain generalization (DG), leaning using privileged information (LUPI), or domain adaptation with privileged information (DAPI).;When some target data samples are available and additional information in the form of labeled data from a different source is also available, the learning problem becomes domain adaptation. Unlike previous DA work, we introduce two novel approaches for the few-shot learning scenario, which require only very few labeled target samples, and even one can be very effective. The first method exploits a Siamese deep neural network architecture for learning an embedding where visual categories from the source and target distributions are semantically aligned and yet maximally separated. The second approach instead, extends adversarial learning to simultaneously maximize the confusion between source and target domains while achieving semantic alignment.;In complete absence of target data, several cheaply available source datasets related to the target distribution can be leveraged as additional information for learning a task. This is the domain generalization setting. We introduce the first deep learning approach to address the DG problem, by extending a Siamese network architecture for learning a representation of visual categories that is invariant with respect to the sources, while imposing semantic alignment and class separation to maximize generalization performance on unseen target domains.;There are situations in which target data for training might come equipped with additional information that can be modeled as an auxiliary view of the data, and that unfortunately is not available during testing. This is the LUPI scenario. We introduce a novel framework based on the information bottleneck that leverages the auxiliary view to improve the performance of visual classifiers. We do so by introducing a formulation that is general, in the sense that can be used with any visual classifier.;Finally, when the available target data is unlabeled, and there is closely related labeled source data, which is also equipped with an auxiliary view as additional information, we pose the question of how to leverage the source data views to train visual classifiers for unseen target data. This is the DAPI scenario. We extend the LUPI framework based on the information bottleneck to learn visual classifiers in DAPI settings and show that privileged information can be leveraged to improve the learning on new domains. Also, the novel DAPI framework is general and can be used with any visual classifier.;Every use of auxiliary information has been validated extensively using publicly available benchmark datasets, and several new state-of-the-art accuracy performance values have been set. Examples of application domains include visual object recognition from RGB images and from depth data, handwritten digit recognition, and gesture recognition from video.
机译:诸如图像或视频之类的可视数据中的对象,人或其行为等实体的自动识别已得到显着改善,现在正被部署在访问控制,社交媒体,在线零售,自动驾驶汽车和其他几种应用程序中。这种视觉识别功能利用了监督学习技术,该技术需要从代表目标特定任务的目标分布中获取大量带标签的训练数据。但是,收集此类训练数据可能很昂贵,需要太多时间,甚至是不可能的。在这项工作中,我们介绍了几种新颖的方法,旨在弥补目标训练数据的不足。与其利用先验知识来构建通常更易于训练的特定任务模型,不如将精力集中在开发通用的视觉识别技术上,在先验知识的概念可以通过培训中可用的附加信息更好地识别。根据此类信息的性质,学习问题可能会变成领域适应(DA),领域概括(DG),使用特权信息(LUPI)进行学习或使用特权信息(DAPI)进行领域适应。可以使用其他来源的标记数据形式的其他信息,并且学习问题就变成了领域适应。与以前的DA工作不同,我们针对少拍学习场景引入了两种新颖的方法,它们仅需要很少的标记目标样本,甚至其中一种也可以非常有效。第一种方法利用Siamese深度神经网络体系结构来学习一种嵌入,其中源和目标分布的视觉类别在语义上是对齐的,但在最大程度上是分开的。相反,第二种方法扩展了对抗性学习,以在实现语义对齐的同时最大化源域和目标域之间的混淆。;在完全没有目标数据的情况下,可以利用与目标分布相关的几个价格便宜的源数据集作为学习信息的附加信息任务。这是域概括设置。我们引入了第一个深度学习方法来解决DG问题,方法是扩展一个暹罗网络体系结构以学习相对于源而言不变的视觉类别表示,同时强加语义对齐和类分离以最大程度地提高看不见目标域的泛化性能在某些情况下,用于训练的目标数据可能配备了可以建模为数据辅助视图的附加信息,但不幸的是在测试过程中无法使用这些信息。这是LUPI方案。我们介绍了一种基于信息瓶颈的新颖框架,该框架利用辅助视图来改善视觉分类器的性能。为此,我们引入了一种通用的表述,即可以与任何视觉分类器一起使用的表述。最后,当未标记可用的目标数据并且有密切相关的标记源数据时,该源数据还配备了辅助功能视图作为附加信息,我们提出了一个问题,即如何利用源数据视图为看不见的目标数据训练视觉分类器。这是DAPI方案。我们基于信息瓶颈扩展了LUPI框架,以学习DAPI设置中的视觉分类器,并表明可以利用特权信息来改善对新域的学习。同样,新颖的DAPI框架是通用的,可以与任何视觉分类器一起使用。辅助信息的每次使用均已使用公开的基准数据集进行了广泛的验证,并且设置了几个新的最新精度性能值。应用领域的示例包括从RGB图像和深度数据识别视觉对象,手写数字识别和视频手势识别。

著录项

  • 作者

    Motiian, Saeid.;

  • 作者单位

    West Virginia University.;

  • 授予单位 West Virginia University.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2019
  • 页码 124 p.
  • 总页数 124
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号