首页> 外文会议>International Conference on Artificial Neural Networks >Aggregating Rich Deep Semantic Features for Fine-Grained Place Classification
【24h】

Aggregating Rich Deep Semantic Features for Fine-Grained Place Classification

机译:聚集丰富的深度语义特征以进行细粒度分类

获取原文

摘要

This paper proposes a method that aggregates rich deep semantic features for fine-grained place classification. As is known to all, the category of images depends on the objects and text as well as the various semantic regions, hierarchical structure, and spatial layout. However, most recently designed fine-grained classification systems ignored this, the complex multi-level semantic structure of images associated with fine-grained classes has not yet been well explored. Therefore, in this work, our approach composed of two modules: Content Estimator (CNE) and Context Estimator (CXE). CNE generates deep content features by encoding global visual cues of images. CXE obtains rich context features of images, and it consists of three children Estimator: Text Context Estimator (TCE), Object Context Estimator (OCE), and Scene Context Estimator (SCE). When inputting an image into CXE, TCE encodes text cues to identify word-level semantic information, OCE extracts high-dimensional feature then maps it to object semantic information, SCE gains hierarchical structure and spatial layout information by recognizing scene cues. To aggregate rich deep semantic features, we fuse the information about CNE and CXE for fine-grained classification. To the best of our knowledge, this is the first work to leverage the text information from an arbitrary-oriented scene text detector for extracting context information. Moreover, our method explores the fusion of semantic features and demonstrates scene features to give more complementary information with the other cues. Furthermore, the proposed approach achieves state-of-the-art performance on a fine-grained classification dataset, 84.3% on Con-Text.
机译:本文提出了一种聚集丰富的深层语义特征的细粒度场所分类的方法。众所周知,图像的类别取决于对象和文本以及各种语义区域,层次结构和空间布局。但是,最近设计的细粒度分类系统忽略了这一点,与细粒度类关联的图像的复杂多级语义结构尚未得到很好的探索。因此,在这项工作中,我们的方法由两个模块组成:内容估计器(CNE)和上下文估计器(CXE)。 CNE通过对图像的全局视觉提示进行编码来生成深层的内容功能。 CXE获得了丰富的图像上下文特征,它由三个子估计器组成:文本上下文估计器(TCE),对象上下文估计器(OCE)和场景上下文估计器(SCE)。当将图像输入到CXE中时,TCE对文本线索进行编码以识别单词级语义信息,OCE提取高维特征,然后将其映射到对象语义信息,SCE通过识别场景线索获得层次结构和空间布局信息。为了聚合丰富的深层语义特征,我们将有关CNE和CXE的信息融合在一起以进行细粒度的分类。据我们所知,这是利用来自面向任意方向的场景文本检测器的文本信息提取上下文信息的第一项工作。此外,我们的方法探索了语义特征的融合,并演示了场景特征,以提供与其他线索更多的互补信息。此外,所提出的方法在细粒度分类数据集上达到了最先进的性能,在Con-Text上达到了84.3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号