首页> 外文期刊>Neurocomputing >Integrating global and local visual features with semantic hierarchies for two-level image annotation
【24h】

Integrating global and local visual features with semantic hierarchies for two-level image annotation

机译:将全局和局部视觉功能与语义层次结构集成在一起以进行两级图像注释

获取原文
获取原文并翻译 | 示例

摘要

Image annotation is a challenge task due to the semantic gap between low-level visual features and high-level human concepts. Most previous annotation methods take the task as a multilabel classification problem. However, these methods always suffer from poor accuracy and efficiency in the case that plentiful visual variations and large semantic vocabularies are encountered. In this paper, we focus on two-level image annotation by integrating both the global and local visual features with semantic hierarchies, in an effort to simultaneously learn annotation correspondences in a relatively small and most relevant subspace. Given an image, the two-level task includes scene classification for the image and object labeling for its regions. For scene classification, we first define several specific scenes that describe the most case of the given image data, and then use support vector machines (SVMs) based on the global features. For region labeling, we first format a set of abstract nouns in accordance with WordNet to define relevant objects, and then use local support tensor machines (LSTMs) based on high-order regional features. By introducing a new conditional random field (CRF) model that exploits the multiple correlations with respect to scene-object hierarchies and object-object relationships, our system achieves a more hierarchical and coherent description of image contents than do the simpler image annotation tasks. Experimental results have been reported over the MSRC and SAIAPR datasets to validate the superiority of using multiple visual features and prior semantic correlations for image annotation by comparing with several state-of-the-art methods. (C) 2015 Elsevier B.V. All rights reserved.
机译:由于低级视觉特征和高级人类概念之间的语义鸿沟,图像注释是一项艰巨的任务。以前的大多数注释方法都将任务视为多标签分类问题。但是,在遇到大量的视觉变化和大量的语义词汇的情况下,这些方法总是遭受差的准确性和效率的困扰。在本文中,我们通过将全局和局部视觉特征与语义层次结构集成在一起,专注于两级图像标注,以在相对较小且最相关的子空间中同时学习标注对应关系。对于给定的图像,两级任务包括图像的场景分类和区域标签。对于场景分类,我们首先定义几个描述给定图像数据大多数情况的特定场景,然后根据全局特征使用支持向量机(SVM)。对于区域标记,我们首先根据WordNet格式化一组抽象名词以定义相关对象,然后使用基于高阶区域特征的局部支持张量机(LSTM)。通过引入一种新的条件随机场(CRF)模型,该模型利用与场景-对象层次结构和对象-对象关系有关的多重相关性,与更简单的图像注释任务相比,我们的系统实现了对图像内容的更多层次和连贯的描述。已经通过MSRC和SAIAPR数据集报告了实验结果,以通过与几种最新方法进行比较来验证使用多种视觉特征和先验语义相关性进行图像标注的优越性。 (C)2015 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号