首页> 外文期刊>IEEE Transactions on Image Processing >Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
【24h】

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

机译:多模式多尺度深度学习的大规模图像标注

获取原文
获取原文并翻译 | 示例
       

摘要

Image annotation aims to annotate a given image with a variable number of class labels corresponding to diverse visual concepts. In this paper, we address two main issues in large-scale image annotation: 1) how to learn a rich feature representation suitable for predicting a diverse set of visual concepts ranging from object, scene to abstract concept and 2) how to annotate an image with the optimal number of class labels. To address the first issue, we propose a novel multi-scale deep model for extracting rich and discriminative features capable of representing a wide range of visual concepts. Specifically, a novel two-branch deep neural network architecture is proposed, which comprises a very deep main network branch and a companion feature fusion network branch designed for fusing the multi-scale features computed from the main branch. The deep model is also made multi-modal by taking noisy user-provided tags as model input to complement the image input. For tackling the second issue, we introduce a label quantity prediction auxiliary task to the main label prediction task to explicitly estimate the optimal label number for a given image. Extensive experiments are carried out on two large-scale image annotation benchmark datasets, and the results show that our method significantly outperforms the state of the art.
机译:图像标注的目的是为给定的图像添加与各种视觉概念相对应的可变数量的类别标签。在本文中,我们解决了大型图像标注中的两个主要问题:1)如何学习适用于预测从对象,场景到抽象概念的各种视觉概念的丰富特征表示,以及2)如何对图像进行注释带有最佳数量的类别标签。为了解决第一个问题,我们提出了一种新颖的多尺度深度模型,用于提取能够代表各种视觉概念的丰富而有区别的特征。具体地,提出了一种新颖的两分支深度神经网络架构,其包括非常深的主网络分支和为融合从主分支计算出的多尺度特征而设计的伴随特征融合网络分支。通过将嘈杂的用户提供的标签作为模型输入来补充图像输入,从而使深度模型成为多模式模型。为了解决第二个问题,我们在主标签预测任务中引入了标签数量预测辅助任务,以明确估计给定图像的最佳标签数量。在两个大型图像注释基准数据集上进行了广泛的实验,结果表明我们的方法明显优于现有技术。

著录项

  • 来源
    《IEEE Transactions on Image Processing》 |2019年第4期|1720-1731|共12页
  • 作者单位

    Beijing Key Laboratory of Big Data Management and Analysis Methods, School of Information, Renmin University of China, Beijing, China;

    Beijing Key Laboratory of Big Data Management and Analysis Methods, School of Information, Renmin University of China, Beijing, China;

    Beijing Key Laboratory of Big Data Management and Analysis Methods, School of Information, Renmin University of China, Beijing, China;

    School of Electronic Engineering and Computer Science, Queen Mary University of London, London, U.K.;

    Department of Electrical Engineering, Columbia University, New York, NY, USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Image annotation; Feature extraction; Visualization; Task analysis; Noise measurement; Image recognition; Predictive models;

    机译:图像标注;特征提取;可视化;任务分析;噪声测量;图像识别;预测模型;
  • 入库时间 2022-08-18 04:11:48

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号