Fine-Grained Visual-Textual Representation Learning

He Xiangteng; Peng Yuxin

首页> 外文期刊>IEEE Transactions on Circuits and Systems for Video Technology >Fine-Grained Visual-Textual Representation Learning

【24h】

Fine-Grained Visual-Textual Representation Learning

机译：细粒度视觉文本代表学习

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Fine-grained visual categorization is to recognize hundreds of subcategories belonging to the same basic-level category, which is a highly challenging task due to the quite subtle and local visual distinctions among similar subcategories. Most existing methods generally learn part detectors to discover discriminative regions for better categorization performance. However, not all parts are beneficial and indispensable for visual categorization, and the setting of part detector number heavily relies on prior knowledge as well as experimental validation. As is known to all, when we describe the object of an image via textual descriptions, we mainly focus on the pivotal characteristics and rarely pay attention to common characteristics as well as the background areas. This is an involuntary transfer from human visual attention to textual attention, which leads to the fact that textual attention tells us how many and which parts are discriminative and significant to categorization. So, textual attention could help us to discover visual attention in the image. Inspired by this, we propose a fine-grained visual-textual representation learning (VTRL) approach, and its main contributions are: 1) fine-grained visual-textual pattern mining devotes to discovering discriminative visual-textual pairwise information for boosting categorization performance through jointly modeling vision and text with generative adversarial networks, which automatically and adaptively discovers discriminative parts and 2) VTRL jointly combines visual and textual information, which preserves the intra-modality and inter-modality information to generate complementary fine-grained representation, as well as further improves categorization performance. Comprehensive experimental results on the widely used CUB-200-2011 and Oxford Flowers-102 datasets demonstrate the effectiveness of our VTRL approach, which achieves the best categorization accuracy compared with the state-of-the-art methods.

机译：细粒度的视觉分类是识别属于相同基本级别类别的数百个子类别，这是一个高度具有挑战性的任务，因为类似的子类别中的微妙和局部视觉区别。大多数现有方法通常学习部分探测器以发现识别性地区以获得更好的分类性能。然而，并非所有部分都是有益的，可视于视觉分类，并且零件检测器数量的设置大量依赖于先前的知识以及实验验证。所有所有人都知道，当我们通过文本描述描述图像的对象时，我们主要关注枢轴特征，很少关注共同的特征和背景区域。这是从人类视觉关注对文本关注的不自愿转移，这导致文本关注告诉我们有多少以及哪些部件是歧视性和分类的重要性。因此，文本的注意力可以帮助我们发现图像中的视觉注意。受此启发，我们提出了一个细粒度的视觉文本表示学习（VTRL）方法，其主要贡献是：1）细粒度的视觉文本模式挖掘致力于发现通过促进分类性能的识别性视觉文本成对信息共同建模视觉和文本与生成的对抗网络，它自动和自适应地发现鉴别部分和2）VTRL联合结合了视觉和文本信息，这保留了模特内和模特间信息以产生互补的细粒度表示，以及进一步提高分类性能。广泛使用的CUB-200-2011和牛津鲜花-102数据集的综合实验结果证明了我们VTRL方法的有效性，与最先进的方法相比，实现了最佳分类准确性。

著录项

来源
《IEEE Transactions on Circuits and Systems for Video Technology》 |2020年第2期|520-531|共12页
作者
He Xiangteng; Peng Yuxin;
展开▼
作者单位

Peking Univ Inst Comp Sci & Technol Beijing 100871 Peoples R China;

Peking Univ Inst Comp Sci & Technol Beijing 100871 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Fine-grained visual categorization; fine-grained visual-textual pattern mining; visual-textual representation learning;

机译：细粒度的视觉分类;细粒度视觉文本模式挖掘;视觉文本表示学习;

相似文献

外文文献
中文文献
专利

1. Catch me if you can: A participant-level rumor detection framework via fine-grained user representation learning [J] . Xueqin Chen, Fan Zhou, Fengli Zhang, Information Processing & Management . 2021,第5期

机译：如果您可以：通过细粒度用户表示学习的参与级谣言检测框架
2. Fine-Grained Privacy Detection with Graph-Regularized Hierarchical Attentive Representation Learning [J] . Chen Xiaolin, Song Xuemeng, Ren Ruiyang, ACM Transactions on Information Systems . 2020,第4期

机译：具有图形正则化分层细心表示学习的细粒度隐私检测
3. A concept ontology triplet network for learning discriminative representations of fine-grained classes [J] . Guiqing He, Qiqi Zhang, Haixi Zhang, Multimedia Tools and Applications . 2020,第33a34期

机译：一种概念本体三重态网络，用于学习细粒度课程的鉴别表示
4. Joint Learning on the Hierarchy Representation for Fine-Grained Human Action Recognition [C] . Mei Chee Leong, Hui Li Tan, Haosong Zhang, IEEE International Conference on Image Processing . 2021

机译：关于细粒度人体行动认可的层次陈述的联合学习
5. Fine-grained Visual Representation Learning with Deep Neural Networks [D] . Xu, Tao. 2018

机译：深度神经网络的细粒度视觉表示学习
6. Representation Learning for Fine-Grained Change Detection [O] . Niall O’ Mahony, Sean Campbell, Lenka Krpalkova, 2021

机译：用于细粒度变化检测的表示学习
7. Fine-Grained Visual-Textual Representation Learning [O] . Xiangteng He, Yuxin Peng 2020

机译：细粒度视觉文本代表学习

Fine-Grained Visual-Textual Representation Learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅