Multi-modal gated recurrent units for image description

Xuelong Li; Aihong Yuan; Xiaoqiang Lu

首页> 外文期刊>Multimedia Tools and Applications >Multi-modal gated recurrent units for image description

【24h】

Multi-modal gated recurrent units for image description

机译：多模态门控循环单元用于图像描述

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Using a natural language sentence to describe the content of an image is a challenging but very important task. It is challenging because a description must not only capture objects contained in the image and the relationships among them, but also be relevant and grammatically correct. In this paper a multi-modal embedding model based on gated recurrent units (GRU) which can generate variable-length description for a given image. In the training step, we apply the convolutional neural network (CNN) to extract the image feature. Then the feature is imported into the multi-modal GRU as well as the corresponding sentence representations. The multi-modal GRU learns the inter-modal relations between image and sentence. And in the testing step, when an image is imported to our multi-modal GRU model, a sentence which describes the image content is generated. The experimental results demonstrate that our multi-modal GRU model obtains the state-of-the-art performance on Flickr8K, Flickr30K and MS COCO datasets.

机译：使用自然语言语句描述图像的内容是一项具有挑战性但非常重要的任务。之所以具有挑战性，是因为描述不仅必须捕获图像中包含的对象及其之间的关系，而且还必须是相关且语法正确的。在本文中，基于门控循环单元（GRU）的多模式嵌入模型可以为给定图像生成可变长度的描述。在训练步骤中，我们应用卷积神经网络（CNN）提取图像特征。然后将特征导入到多模式GRU以及相应的句子表示中。多模态GRU学习图像与句子之间的模态关系。并且在测试步骤中，当将图像导入到我们的多模态GRU模型中时，将生成描述图像内容的句子。实验结果表明，我们的多模式GRU模型在Flickr8K，Flickr30K和MS COCO数据集上获得了最先进的性能。

著录项

来源
《Multimedia Tools and Applications》 |2018年第22期|29847-29869|共23页
作者
Xuelong Li; Aihong Yuan; Xiaoqiang Lu;
展开▼
作者单位

Center for OPTical IMagery Analysis and Learning (OPTIMAL), Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences,University of Chinese Academy of Sciences;

Center for OPTical IMagery Analysis and Learning (OPTIMAL), Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences,University of Chinese Academy of Sciences;

Center for OPTical IMagery Analysis and Learning (OPTIMAL), Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Image description; Gated recurrent unit; Convolutional neural network; Multi-modal embedding;

机译：图像描述;门控递归单元;卷积神经网络;多模态嵌入;

相似文献

外文文献
中文文献
专利

1. Reference-based model using multimodal gated recurrent units for image captioning [J] . Tiago do Carmo Nogueira, Cassio Dener Noronha Vinhal, Gelson da Cruz Junior, Multimedia Tools and Applications . 2020,第41a42期

机译：基于参考的模型，使用多模式门控复发单元进行图像标题
2. The diverse multi-modal imaging findings of recurrent primary vitreoretinal lymphoma [J] . Jennifer Lee, Debra A. Goldstein American Journal of Ophthalmology Case Reports . 2020,第a期

机译：经常性原发性培养物淋巴瘤的多样性多模态成像结果
3. Quantitative multi-modal MR imaging as a non-invasive prognostic tool for patients with recurrent low-grade glioma [J] . Neill Evan, Luks Tracy, Dayal Manisha, Journal of neuro-oncology. . 2017,第1期

机译：定量多模态MR成像作为复发性低级胶质瘤患者的非侵入性预后工具
4. Generating Image Description on Indonesian Language using Convolutional Neural Network and Gated Recurrent Unit [C] . Aditya Alif Nugraha, Anditya Arifianto, Suyanto International Conference on Information and Communication Technology . 2019

机译：使用卷积神经网络和门控递归单元在印尼语中生成图像描述
5. Multi-modal control: From motion description languages to optimal control. [D] . Delmotte, Florent C. 2006

机译：多模式控制：从运动描述语言到最佳控制。
6. The diverse multi-modal imaging findings of recurrent primary vitreoretinal lymphoma [O] . Jennifer Lee, Debra A. Goldstein 2020

机译：经常性原发性培养物淋巴瘤的多样性多模态成像结果
7. Multi-modal gated recurrent units for image description [O] . Xuelong Li, Aihong Yuan, Xiaoqiang Lu 2018

机译：用于图像描述的多模态门控复发单元

Multi-modal gated recurrent units for image description

摘要

著录项

相似文献

相关主题

期刊订阅