首页> 外文会议>International Conference on Artificial Neural Networks >Aggregating Rich Deep Semantic Features for Fine-Grained Place Classification
【24h】

Aggregating Rich Deep Semantic Features for Fine-Grained Place Classification

机译:聚集丰富的深层语义特征,用于细粒度的分类

获取原文

摘要

This paper proposes a method that aggregates rich deep semantic features for fine-grained place classification. As is known to all, the category of images depends on the objects and text as well as the various semantic regions, hierarchical structure, and spatial layout. However, most recently designed fine-grained classification systems ignored this, the complex multi-level semantic structure of images associated with fine-grained classes has not yet been well explored. Therefore, in this work, our approach composed of two modules: Content Estimator (CNE) and Context Estimator (CXE). CNE generates deep content features by encoding global visual cues of images. CXE obtains rich context features of images, and it consists of three children Estimator: Text Context Estimator (TCE), Object Context Estimator (OCE), and Scene Context Estimator (SCE). When inputting an image into CXE, TCE encodes text cues to identify word-level semantic information, OCE extracts high-dimensional feature then maps it to object semantic information, SCE gains hierarchical structure and spatial layout information by recognizing scene cues. To aggregate rich deep semantic features, we fuse the information about CNE and CXE for fine-grained classification. To the best of our knowledge, this is the first work to leverage the text information from an arbitrary-oriented scene text detector for extracting context information. Moreover, our method explores the fusion of semantic features and demonstrates scene features to give more complementary information with the other cues. Furthermore, the proposed approach achieves state-of-the-art performance on a fine-grained classification dataset, 84.3% on Con-Text.
机译:本文提出了聚集丰富的深层语义特征进行细粒度的地方分类的方法。如众所周知,图像的类别依赖于对象和文本以及各种语义区域,层次结构,和空间布局。然而,最近的设计忽略了这个细粒度的分类系统,细粒度类相关联的图像的复杂的多层次语义结构还没有得到很好的探索。因此,在这项工作中,我们的方法由两个模块组成:内容估计(CNE)和上下文估算(CXE)。 CNE通过对图像进行编码的全球视觉线索产生深刻的内涵特征。图像的CXE取得丰富的背景特征,它由三个孩子估计:文本语境估算(TCE),对象上下文估计(OCE)和场景环境估计(SCE)。当输入图像分成CXE,TCE编码文本线索来识别字级语义信息,OCE提取高维特征然后映射它通过识别场景线索,以对象的语义信息,SCE增益分层结构和空间布局信息。聚集丰富的深层语义特征,我们对融合和CNE为CXE细粒度的分类信息。据我们所知,这是利用从面向任意场景文本探测器的文本信息提取上下文信息的第一项工作。此外,我们的方法探讨了语义特征融合,并演示场景功能,能和其他线索更多补充信息。此外,所提出的方法实现对细粒度分类数据集的国家的最先进的性能,在精读文本84.3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号