首页> 外文期刊>International Journal of Computer Vision >Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization
【24h】

Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization

机译:哪个以及凝视的地区有多少:焦点鉴别区域,用于细粒度的视觉分类

获取原文
获取原文并翻译 | 示例
           

摘要

Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories that belong to the same superclass. Since the distinctions among similar subcategories are quite subtle and local, it is highly challenging to distinguish them from each other even for humans. So the localization of distinctions is essential for fine-grained visual categorization, and there are two pivotal problems: (1) Which regions are discriminative and representative to distinguish from other subcategories? (2) How many discriminative regions are necessary to achieve the best categorization performance? It is still difficult to address these two problems adaptively and intelligently. Artificial prior and experimental validation are widely used in existing mainstream methods to discover which and how many regions to gaze. However, their applications extremely restrict the usability and scalability of the methods. To address the above two problems, this paper proposes a multi-scale and multi-granularity deep reinforcement learning approach (M2DRL), which learns multi-granularity discriminative region attention and multi-scale region-based feature representation. Its main contributions are as follows: (1) Multi-granularity discriminative localization is proposed to localize the distinctions via a two-stage deep reinforcement learning approach, which discovers the discriminative regions with multiple granularities in a hierarchical manner (which problem), and determines the number of discriminative regions in an automatic and adaptive manner (how many problem). (2) Multi-scale representation learning helps to localize regions in different scales as well as encode images in different scales, boosting the fine-grained visual categorization performance. (3) Semantic reward function is proposed to drive M2DRL to fully capture the salient and conceptual visual information, via jointly considering attention and category information in the reward function. It allows the deep reinforcement
机译:细粒度的视觉分类(FGVC)旨在区分属于同一超类的类似子类别。由于类似的子类别中的区别是非常微妙和本地的,因此即使对于人类,彼此区分它们是非常具有挑战性的。因此,区别的本地化对于细粒度的视觉分类至关重要,并且存在两个关键问题:(1)哪些地区是区分其他子类别的歧视性和代表的歧视性和代表(2)达到最佳分类性能有多少判别区域?仍然难以自适应和智能地解决这两个问题。人工先前和实验验证广泛用于现有的主流方法,以发现哪些地区凝视。但是,它们的应用程序极大地限制了方法的可用性和可扩展性。为了解决上述两个问题,本文提出了一种多尺度和多粒度的深度增强学习方法(M2DRL),其学习多粒度鉴别区域注意力和基于多尺度区域的特征表示。其主要贡献如下:(1)提出多粒度鉴别定位,以通过两级深度加强学习方法定位区别,该方法以分层方式发现具有多个粒度的辨别区域(哪个问题)和确定以自动和自适应方式(有多少问题)的歧视区域的数量。 (2)多尺度表示学习有助于将不同尺度的区域本地化以及不同尺度的编码图像,提高了细粒度的视觉分类性能。 (3)提出语义奖励功能,以驱动M2DRL来完全捕获突出和概念视觉信息,通过共同考虑奖励功能中的关注和类别信息。它允许深增强

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号