首页> 外文期刊>ACM transactions on multimedia computing communications and applications >Visual Semantic-Based Representation Learning Using Deep CNNs for Scene Recognition
【24h】

Visual Semantic-Based Representation Learning Using Deep CNNs for Scene Recognition

机译:基于视觉语义的表示学习使用深CNN进行场景识别

获取原文
获取原文并翻译 | 示例

摘要

In this work, we address the task of scene recognition from image data. A scene is a spatially correlated arrangement of various visual semantic contents also known as concepts, e.g., "chair," "car," "sky," etc. Representation learning using visual semantic content can be regarded as one of the most trivial ideas as it mimics the human behavior of perceiving visual information. Semantic multinomial (SMN) representation is one such representation that captures semantic information using posterior probabilities of concepts. The core part of obtaining SMN representation is the building of concept models. Therefore, it is necessary to have ground-truth (true) concept labels for every concept present in an image. Moreover, manual labeling of concepts is practically not feasible due to the large number of images in the dataset. To address this issue, we propose an approach for generating pseudo-concepts in the absence of true concept labels. We utilize the pre-trained deep CNN-based architectures where activation maps (filter responses) from convolutional layers are considered as initial cues to the pseudo-concepts. The non-significant activation maps are removed using the proposed filter-specific threshold-based approach that leads to the removal of non-prominent concepts from data. Further, we propose a grouping mechanism to group the same pseudoconcepts using subspace modeling of filter responses to achieve a non-redundant representation. Experimental studies show that generated SMN representation using pseudo-concepts achieves comparable results for scene recognition tasks on standard datasets like MIT-67 and SUN-397 even in the absence of true concept labels.
机译:在这项工作中,我们解决了从图像数据的场景识别的任务。一个场景是一种空间相关的各种视觉语义内容的布置,也称为概念,例如“椅子”,“汽车”,“”天空“等。使用视觉语义内容的表示学习可以被视为最琐碎的想法之一它模仿感知视觉信息的人类行为。语义多项式(SMN)表示是使用概念后验概率捕获语义信息的一种这样的表示。获得SMN表示的核心部分是构建概念模型。因此,有必要为图像中存在的每个概念进行地面真理(真实的)概念标签。此外,由于数据集中的大量图像,概念的手动标记实际上是不可行的。为了解决这个问题,我们提出了一种在没有真正概念标签的情况下发行伪概念的方法。我们利用预训练的深度基于CNN的架构,其中来自卷积层的激活图(过滤器响应)被认为是伪概念的初始提示。使用所提出的基于滤波器的基于阈值的方法去除非重要激活图,该方法导致从数据中删除非突出概念。此外,我们提出了使用滤波器响应的子空间建模来分组的分组机制,以实现非冗余表示。实验研究表明,即使在没有真正的概念标签的情况下,使用伪概念的生成SMN表示可以在MIT-67和Sun-397等标准数据集上实现了类似的结果识别任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号