...
首页> 外文期刊>Multimedia Tools and Applications >Multi-modal gated recurrent units for image description
【24h】

Multi-modal gated recurrent units for image description

机译:多模态门控循环单元用于图像描述

获取原文
获取原文并翻译 | 示例
           

摘要

Using a natural language sentence to describe the content of an image is a challenging but very important task. It is challenging because a description must not only capture objects contained in the image and the relationships among them, but also be relevant and grammatically correct. In this paper a multi-modal embedding model based on gated recurrent units (GRU) which can generate variable-length description for a given image. In the training step, we apply the convolutional neural network (CNN) to extract the image feature. Then the feature is imported into the multi-modal GRU as well as the corresponding sentence representations. The multi-modal GRU learns the inter-modal relations between image and sentence. And in the testing step, when an image is imported to our multi-modal GRU model, a sentence which describes the image content is generated. The experimental results demonstrate that our multi-modal GRU model obtains the state-of-the-art performance on Flickr8K, Flickr30K and MS COCO datasets.
机译:使用自然语言语句描述图像的内容是一项具有挑战性但非常重要的任务。之所以具有挑战性,是因为描述不仅必须捕获图像中包含的对象及其之间的关系,而且还必须是相关且语法正确的。在本文中,基于门控循环单元(GRU)的多模式嵌入模型可以为给定图像生成可变长度的描述。在训练步骤中,我们应用卷积神经网络(CNN)提取图像特征。然后将特征导入到多模式GRU以及相应的句子表示中。多模态GRU学习图像与句子之间的模态关系。并且在测试步骤中,当将图像导入到我们的多模态GRU模型中时,将生成描述图像内容的句子。实验结果表明,我们的多模式GRU模型在Flickr8K,Flickr30K和MS COCO数据集上获得了最先进的性能。

著录项

  • 来源
    《Multimedia Tools and Applications》 |2018年第22期|29847-29869|共23页
  • 作者单位

    Center for OPTical IMagery Analysis and Learning (OPTIMAL), Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences,University of Chinese Academy of Sciences;

    Center for OPTical IMagery Analysis and Learning (OPTIMAL), Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences,University of Chinese Academy of Sciences;

    Center for OPTical IMagery Analysis and Learning (OPTIMAL), Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Image description; Gated recurrent unit; Convolutional neural network; Multi-modal embedding;

    机译:图像描述;门控递归单元;卷积神经网络;多模态嵌入;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号