首页> 外文期刊>Image Processing, IEEE Transactions on >A Deep Multi-Modal CNN for Multi-Instance Multi-Label Image Classification
【24h】

A Deep Multi-Modal CNN for Multi-Instance Multi-Label Image Classification

机译:用于多实例多标签图像分类的深度多模态CNN

获取原文
获取原文并翻译 | 示例
       

摘要

Deep convolutional neural networks (CNNs) have shown superior performance on the task of single-label image classification. However, the applicability of CNNs to multi-label images still remains an open problem, mainly because of two reasons. First, each image is usually treated as an inseparable entity and represented as one instance, which mixes the visual information corresponding to different labels. Second, the correlations amongst labels are often overlooked. To address these limitations, we propose a deep multi-modal CNN for multi-instance multi-label image classification, called MMCNN-MIML. By combining CNNs with multi-instance multi-label (MIML) learning, our model represents each image as a bag of instances for image classification and inherits the merits of both CNNs and MIML. In particular, MMCNN-MIML has three main appealing properties: 1) it can automatically generate instance representations for MIML by exploiting the architecture of CNNs; 2) it takes advantage of the label correlations by grouping labels in its later layers; and 3) it incorporates the textual context of label groups to generate multi-modal instances, which are effective in discriminating visually similar objects belonging to different groups. Empirical studies on several benchmark multi-label image data sets show that MMCNN-MIML significantly outperforms the state-of-the-art baselines on multi-label image classification tasks.
机译:深度卷积神经网络(CNN)在单标签图像分类任务中显示了卓越的性能。但是,主要由于两个原因,CNN在多标签图像上的适用性仍然是一个未解决的问题。首先,每个图像通常被视为一个不可分割的实体,并被表示为一个实例,它混合了对应于不同标签的视觉信息。其次,标签之间的相关性经常被忽略。为了解决这些限制,我们提出了一种用于多实例多标签图像分类的深层多模式CNN,称为MMCNN-MIML。通过将CNN与多实例多标签(MIML)学习相结合,我们的模型将每个图像表示为用于图像分类的实例包,并继承了CNN和MIML的优点。特别地,MMCNN-MIML具有三个主要吸引人的属性:1)它可以通过利用CNN的体系结构自动生成MIML的实例表示; 2)通过在后面的层中对标签进行分组来利用标签的相关性; 3)它结合了标签组的文本上下文以生成多模式实例,这对于区分属于不同组的视觉相似对象是有效的。对几个基准多标签图像数据集的经验研究表明,MMCNN-MIML在多标签图像分类任务上明显优于最新的基准。

著录项

  • 来源
    《Image Processing, IEEE Transactions on》 |2018年第12期|6025-6038|共14页
  • 作者单位

    Department of Computer Science and Technology, National Engineering Lab of Big Data Analytics, Xi’an Jiaotong University, Xi’an, China;

    Department of Computer Science and Technology, National Engineering Lab of Big Data Analytics, Xi’an Jiaotong University, Xi’an, China;

    Department of Computer Science and Technology, National Engineering Lab of Big Data Analytics, Xi’an Jiaotong University, Xi’an, China;

    Division of Computer Science and Engineering, School of Electrical Engineering and Computer Science, Louisiana State University, Baton Rouge, LA, USA;

    Department of Computer Science and Technology, SPKLSTN Lab, Xi’an Jiaotong University, Xi’an, China;

    School of Foreign Studies, Xi’an Jiaotong University, Xi’an, China;

    Department of Computer Science and Technology, SPKLSTN Lab, Xi’an Jiaotong University, Xi’an, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Visualization; Task analysis; Correlation; Feature extraction; Computer science; Electronic mail; Sun;

    机译:可视化;任务分析;关联;特征提取;计算机科学;电子邮件;Sun;
  • 入库时间 2022-08-17 13:09:48

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号