Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization

He Xiangteng; Peng Yuxin; Zhao Junjie

首页> 外文期刊>International Journal of Computer Vision >Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization

【24h】

Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization

机译：哪个以及凝视的地区有多少：焦点鉴别区域，用于细粒度的视觉分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories that belong to the same superclass. Since the distinctions among similar subcategories are quite subtle and local, it is highly challenging to distinguish them from each other even for humans. So the localization of distinctions is essential for fine-grained visual categorization, and there are two pivotal problems: (1) Which regions are discriminative and representative to distinguish from other subcategories? (2) How many discriminative regions are necessary to achieve the best categorization performance? It is still difficult to address these two problems adaptively and intelligently. Artificial prior and experimental validation are widely used in existing mainstream methods to discover which and how many regions to gaze. However, their applications extremely restrict the usability and scalability of the methods. To address the above two problems, this paper proposes a multi-scale and multi-granularity deep reinforcement learning approach (M2DRL), which learns multi-granularity discriminative region attention and multi-scale region-based feature representation. Its main contributions are as follows: (1) Multi-granularity discriminative localization is proposed to localize the distinctions via a two-stage deep reinforcement learning approach, which discovers the discriminative regions with multiple granularities in a hierarchical manner (which problem), and determines the number of discriminative regions in an automatic and adaptive manner (how many problem). (2) Multi-scale representation learning helps to localize regions in different scales as well as encode images in different scales, boosting the fine-grained visual categorization performance. (3) Semantic reward function is proposed to drive M2DRL to fully capture the salient and conceptual visual information, via jointly considering attention and category information in the reward function. It allows the deep reinforcement

机译：细粒度的视觉分类（FGVC）旨在区分属于同一超类的类似子类别。由于类似的子类别中的区别是非常微妙和本地的，因此即使对于人类，彼此区分它们是非常具有挑战性的。因此，区别的本地化对于细粒度的视觉分类至关重要，并且存在两个关键问题：（1）哪些地区是区分其他子类别的歧视性和代表的歧视性和代表（2）达到最佳分类性能有多少判别区域？仍然难以自适应和智能地解决这两个问题。人工先前和实验验证广泛用于现有的主流方法，以发现哪些地区凝视。但是，它们的应用程序极大地限制了方法的可用性和可扩展性。为了解决上述两个问题，本文提出了一种多尺度和多粒度的深度增强学习方法（M2DRL），其学习多粒度鉴别区域注意力和基于多尺度区域的特征表示。其主要贡献如下：（1）提出多粒度鉴别定位，以通过两级深度加强学习方法定位区别，该方法以分层方式发现具有多个粒度的辨别区域（哪个问题）和确定以自动和自适应方式（有多少问题）的歧视区域的数量。（2）多尺度表示学习有助于将不同尺度的区域本地化以及不同尺度的编码图像，提高了细粒度的视觉分类性能。（3）提出语义奖励功能，以驱动M2DRL来完全捕获突出和概念视觉信息，通过共同考虑奖励功能中的关注和类别信息。它允许深增强

著录项

来源
《International Journal of Computer Vision》 |2019年第9期|共21页
作者
He Xiangteng; Peng Yuxin; Zhao Junjie;
展开▼
作者单位

Peking Univ Inst Comp Sci &

Technol Beijing 100871 Peoples R China;

Peking Univ Inst Comp Sci &

Technol Beijing 100871 Peoples R China;

Peking Univ Inst Comp Sci &

Technol Beijing 100871 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Fine-grained visual categorization; Deep reinforcement learning; Multi-granularity discriminative localization; Multi-scale representation learning; Unsupervised discriminative localization; Semantic reward;

机译：细粒度的视觉分类;深增强学习;多粒度鉴别定位;多尺度代表学习;无监督的歧视性本地化;语义奖励;

相似文献

外文文献
中文文献
专利

1. Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization [J] . He Xiangteng, Peng Yuxin, Zhao Junjie International Journal of Computer Vision . 2019,第9期

机译：哪个以及凝视的地区有多少：焦点鉴别区域，用于细粒度的视觉分类
2. Visual regions V2, V3, and MT can discriminate between visual motion trajectories even when you can't. [J] . Diana Gorbet, Frances Wilkinson, Hugh Wilson Journal of vision . 2013,第9期

机译：视觉区域V2，V3和MT甚至可以在视觉运动轨迹之间进行区分。
3. Visual regions V2, V3, and MT can discriminate between visual motion trajectories even when you can't. [J] . Diana Gorbet, Frances Wilkinson, Hugh Wilson Journal of vision . 2013,第9期

机译：视觉区域V2，V3和MT甚至可以在视觉运动轨迹之间进行区分。
4. Filtration and Distillation: Enhancing Region Attention for Fine-Grained Visual Categorization [C] . Chuanbin Liu, Hongtao Xie, Zheng-Jun Zha, AAAI Conference on Artificial Intelligence . 2020

机译：过滤和蒸馏：提高地区注意细粒度视觉分类
5. Visual-Linguistic Semantic Alignment: Fusing Human Gaze and Spoken Narratives for Image Region Annotation [D] . Vaidyanathan, Preethi. 2017

机译：视觉语言语义对齐：融合人类凝视和语音叙事的图像区域注释
6. Recognizing visual speech: Reduced responses in visual-movement regions but not other speech regions in autism [O] . Kamila Borowiak, Stefanie Schelinski, Katharina von Kriegstein 2018

机译：识别视觉语音：视觉运动区域的反应减少但自闭症的其他语音区域却没有
7. CNN Fixations: An Unraveling Approach to Visualize the Discriminative Image Regions [O] . Konda Reddy Mopuri, Utsav Garg, R. Venkatesh Babu 2019

机译：CNN固定：一种可视化鉴别图像区域的解开方法
8. Keypoint Density-Based Region Proposal for Fine-Grained Object Detection and Classification Using Regions with Convolutional Neural Network Features. [R] . Turner, J. T., Gupta, K., Morris, B., 2015

机译：基于关键点密度的区域提议，用于使用具有卷积神经网络特征的区域进行细粒度目标检测和分类。

Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization

摘要

著录项

相似文献

相关主题

期刊订阅