首页> 外文OA文献 >Fine-Grained Visual-Textual Representation Learning
【2h】

Fine-Grained Visual-Textual Representation Learning

机译:细粒度视觉文本代表学习

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Fine-grained image classification is to recognize hundreds of subcategoriesbelonging to the same basic-level category, which is a highly challenging taskdue to the quite subtle visual distinctions among similar subcategories. Mostexisting methods generally learn part detectors to discover discriminativeregions for better performance. However, not all localized parts are beneficialand indispensable for classification, and the setting for number of partdetectors relies heavily on prior knowledge as well as experimental results. Asis known to all, when we describe the object of an image into text via naturallanguage, we only focus on the pivotal characteristics, and rarely payattention to common characteristics as well as the background areas. This is aninvoluntary transfer from human visual attention to textual attention, whichleads to the fact that textual attention tells us how many and which parts arediscriminative and significant. So textual attention of natural languagedescriptions could help us to discover visual attention in image. Inspired bythis, we propose a visual-textual attention driven fine-grained representationlearning (VTA) approach, and its main contributions are: (1) Fine-grainedvisual-textual pattern mining devotes to discovering discriminativevisual-textual pairwise information for boosting classification through jointlymodeling vision and text with generative adversarial networks (GANs), whichautomatically and adaptively discovers discriminative parts. (2) Visual-textualrepresentation learning jointly combine visual and textual information, whichpreserves the intra-modality and inter-modality information to generatecomplementary fine-grained representation, and further improve classificationperformance. Experiments on the two widely-used datasets demonstrate theeffectiveness of our VTA approach, which achieves the best classificationaccuracy.
机译:细粒度的图像分类是为了识别数百个子类别雄厚地到相同的基本级别类别,这是一个非常具有挑战性的TaskDue,以与类似的子类别中的相当微妙的视觉区别。大部分方法通常学习零件探测器以发现歧视以获得更好的性能。然而,并非所有本地化部分都是必不可少的分类,并且零部数的设置严重依赖于先前的知识以及实验结果。所有人都知道的,当我们通过Naturallanguage将图像的对象描述为文本时,我们只关注关键特征,很少有人对共同特征以及背景区域的支付。这是一种从人类视觉关注对文本关注的一个非自愿转移,这对文本的注意力告诉我们有多少,并且群体受到刺激性和重大的影响。如此对天然朗格尼斯的文本关注可以帮助我们发现图像中的视觉注意力。受到影响,我们提出了一种视觉文本的注意力驱动的细粒度代表性学(VTA)方法,其主要贡献是:(1)细粒度文本模式挖掘致力于发现通过联合意见来促进分类的鉴别性visual-teachweive信息和具有生成对冲网络(GANS)的文本,适于和自适应地发现歧视部位。 (2)Visual-TexturePresentation学习共同组合视觉和文本信息,将模态内部和模态信息进行换句话,并进一步提高分类性能。两种广泛使用的数据集的实验表明了我们的VTA方法的无效,这实现了最佳的分类。

著录项

  • 作者

    Xiangteng He; Yuxin Peng;

  • 作者单位
  • 年度 2020
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号