Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

Yulei Niu; Zhiwu Lu; Ji-Rong Wen; Tao Xiang; Shih-Fu Chang

首页> 外文期刊>IEEE Transactions on Image Processing >Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

【24h】

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

机译：多模式多尺度深度学习的大规模图像标注

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Image annotation aims to annotate a given image with a variable number of class labels corresponding to diverse visual concepts. In this paper, we address two main issues in large-scale image annotation: 1) how to learn a rich feature representation suitable for predicting a diverse set of visual concepts ranging from object, scene to abstract concept and 2) how to annotate an image with the optimal number of class labels. To address the first issue, we propose a novel multi-scale deep model for extracting rich and discriminative features capable of representing a wide range of visual concepts. Specifically, a novel two-branch deep neural network architecture is proposed, which comprises a very deep main network branch and a companion feature fusion network branch designed for fusing the multi-scale features computed from the main branch. The deep model is also made multi-modal by taking noisy user-provided tags as model input to complement the image input. For tackling the second issue, we introduce a label quantity prediction auxiliary task to the main label prediction task to explicitly estimate the optimal label number for a given image. Extensive experiments are carried out on two large-scale image annotation benchmark datasets, and the results show that our method significantly outperforms the state of the art.

机译：图像标注的目的是为给定的图像添加与各种视觉概念相对应的可变数量的类别标签。在本文中，我们解决了大型图像标注中的两个主要问题：1）如何学习适用于预测从对象，场景到抽象概念的各种视觉概念的丰富特征表示，以及2）如何对图像进行注释带有最佳数量的类别标签。为了解决第一个问题，我们提出了一种新颖的多尺度深度模型，用于提取能够代表各种视觉概念的丰富而有区别的特征。具体地，提出了一种新颖的两分支深度神经网络架构，其包括非常深的主网络分支和为融合从主分支计算出的多尺度特征而设计的伴随特征融合网络分支。通过将嘈杂的用户提供的标签作为模型输入来补充图像输入，从而使深度模型成为多模式模型。为了解决第二个问题，我们在主标签预测任务中引入了标签数量预测辅助任务，以明确估计给定图像的最佳标签数量。在两个大型图像注释基准数据集上进行了广泛的实验，结果表明我们的方法明显优于现有技术。

著录项

来源
《IEEE Transactions on Image Processing》 |2019年第4期|1720-1731|共12页
作者
Yulei Niu; Zhiwu Lu; Ji-Rong Wen; Tao Xiang; Shih-Fu Chang;
展开▼
作者单位

Beijing Key Laboratory of Big Data Management and Analysis Methods, School of Information, Renmin University of China, Beijing, China;

Beijing Key Laboratory of Big Data Management and Analysis Methods, School of Information, Renmin University of China, Beijing, China;

Beijing Key Laboratory of Big Data Management and Analysis Methods, School of Information, Renmin University of China, Beijing, China;

School of Electronic Engineering and Computer Science, Queen Mary University of London, London, U.K.;

Department of Electrical Engineering, Columbia University, New York, NY, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Image annotation; Feature extraction; Visualization; Task analysis; Noise measurement; Image recognition; Predictive models;

机译：图像标注;特征提取;可视化;任务分析;噪声测量;图像识别;预测模型;

相似文献

外文文献
中文文献
专利

1. Multi-modal multi-concept-based deep neural network for automatic image annotation [J] . Xu Haijiao, Huang Changqin, Huang Xiaodi, Multimedia Tools and Applications . 2019,第21期

机译：基于多模态多概念的深度神经网络用于图像自动标注
2. Large-scale image annotation with image-text hybrid learning models [J] . Chien Been-Chian, Ku Chia-Wei Soft computing: A fusion of foundations, methodologies and applications . 2017,第11期

机译：具有图像文本混合学习模型的大规模图像注释
3. Large-Scale Multi-modal Distance Metric Learning with Application to Content-Based Information Retrieval and Image Classification [J] . Rasheed Ali Salim, Zabihzadeh Davood, Al-Obaidi Sumia Abdulhussien Razooqi International Journal of Pattern Recognition and Artificial Intelligence . 2020,第13期

机译：应用于基于内容的信息检索和图像分类的大规模多模态距离度量学习
4. Target retrieval in large-scale and high-resolution synthetic aperture radar imagery based on deep learning and multi-scale saliency [C] . Song Tu, Junbo Liao, Yi Su IEEE International Conference on Image Processing . 2016

机译：基于深度学习和多尺度显着性的大规模高分辨率高分辨率合成孔径雷达图像目标检索
5. Image annotation and retrieval based on multi-modal feature clustering and similarity propagation. [D] . Ben Ismail, Mohamed Maher. 2011

机译：基于多模式特征聚类和相似度传播的图像标注和检索。
6. Building Large-Scale Quantitative Imaging Databases with Multi-Scale Deep Reinforcement Learning: Initial Experience with Whole-Body Organ Volumetric Analyses [O] . David J. Winkel, Hanns-Christian Breit, Thomas J. Weikert, 2021

机译：具有多尺度深度加强学习的大规模定量成像数据库：全身器官体积分析的初始经验
7. Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning [O] . Noah F. Greenwald, Geneva Miller, Erick Moen, 2021

机译：使用大规模数据注释和深度学习的人力水平绩效的组织图像的全细胞分割

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅