...
首页> 外文期刊>Journal of visual communication & image representation >A probabilistic topic model using deep visual word representation for simultaneous image classification and annotation
【24h】

A probabilistic topic model using deep visual word representation for simultaneous image classification and annotation

机译:一种概率主题模型,使用深视觉词表示同声图像分类和注释

获取原文
获取原文并翻译 | 示例

摘要

Researches have shown that holistic examination of an image provides better understanding of the image compared to separate processes each devoted to a single task like annotation, classification or segmentation. During the past decades, there have been several efforts for simultaneous image classification and annotation using probabilistic or neural network based topic models. Despite their relative success, most of these models suffer from the poor visual word representation and the imbalance between the number of visual and annotation words in the training data. This paper proposes a novel model for simultaneous image classification and annotation model based on SupDocNADE, a neural network based topic model for image classification and annotation. The proposed model, named wSupDocNADE, addresses the above shortcomings by using a new coding and introducing a weighting mechanism for the SupDocNADE model. In the coding step of the model, several patches extracted from the input image are first fed to a deep convolutional neural network and the feature vectors obtained from this network are coded using the LLC coding. These vectors are then aggregated in a final descriptor through sum pooling. To overcome the imbalance between the visual and annotation words, a weighting factor is considered for each visual or annotation word. The weights of the visual words are set based on their frequencies obtained from the pooling method and the weights of the annotation words are learned from the training data. The experimental results on three benchmark datasets show the superiority of the proposed model in both image classification and annotation tasks over state-of-the-art models. (C) 2019 Elsevier Inc. All rights reserved.
机译:研究表明,与单独的进程相比,图像的整体检查提供了更好地理解图像,每个过程都致力于作为注释,分类或分割等单一任务。在过去的几十年中,使用概率或神经网络的主题模型来说,已经有几项努力进行同步图像分类和注释。尽管他们相对成功,但大多数模型都遭受了糟糕的视觉词表示和训练数据中的视觉和注释词数之间的不平衡。本文提出了一种基于SupdoCnade的同步图像分类和注释模型的新模型,是一种基于纯网络的图像分类和注释的主题模型。所提出的模型名为WSUPDOCNADE,通过使用新的编码来解决上述缺点,并为纯粹的纯网络模型引入加权机制。在模型的编码步骤中,从输入图像中提取的若干贴片首先被馈送到深卷积神经网络,并且使用LLC编码对从该网络获得的特征向量进行编码。然后通过汇总汇总在最终描述符中聚合这些向量。为了克服视觉和注释词之间的不平衡,考虑每个视觉或注释字的加权因子。基于从汇集方法获得的频率设置视觉词的权重,并且从训练数据中了解注释词的权重。三个基准数据集上的实验结果显示了在最先进的模型上的图像分类和注释任务中提出模型的优越性。 (c)2019 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号