Generating image captions through multimodal embedding

Dash Sandeep Kumar; Saha Saurav; Pakray Partha; Gelbukh Alexander

首页> 外文期刊>Journal of intelligent & fuzzy systems: Applications in Engineering and Technology >Generating image captions through multimodal embedding

【24h】

Generating image captions through multimodal embedding

机译：通过多模式嵌入生成图像标题

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Caption generation requires best of both Computer Vision and Natural Language Processing. Due to recent improvements in both of them many efficient models have been developed. Automatic Image Captioning can be utilized to provide descriptions of website content or to engender frame-by-frame descriptions of video for the vision-impaired and in many such applications. In this work, a model is described which is utilized to generate novel image captions for a previously unseen image by utilizing a multimodal architecture by amalgamation of a Recurrent Neural Network (RNN) and a Convolutional Neural Network (CNN). The model is trained on Microsoft Common Objects in Context (MSCOCO), an image captioning dataset that aligns captions and images in the same representation space, so that an image is close to its relevant captions in that space and far away from dissimilar captions and dissimilar images. ResNet-50 architecture is used for extracting features from the images and GloVe embeddings are used along with Gated Recurrent Unit (GRU) in Recurrent Neural Network (RNN) for text representation. MSCOCO evaluation server is used for evaluation of the machine generated caption for a given image.

机译：标题一代要求最好的计算机视觉和自然语言处理。由于最近的两个改进了许多有效的模型已经开发出来。可以利用自动图像标题来提供网站内容的描述或者为视频禁止障碍和许多这种应用提供视频的逐帧描述。在这项工作中，描述了一种模型，其利用通过利用经常性神经网络（RNN）和卷积神经网络（CNN）来利用多模码架构来生成先前未经多媒体图像的新颖图像字幕。该模型在上下文（Mscoco）中，在Microsoft常见对象上培训，该图像标题数据集将标题和图像对齐在同一表示空间中，使得图像靠近该空间中的相关标题，远离不同的标题和不同的标题图片。 Reset-50架构用于从图像中提取来自图像的功能，并在复制神经网络（RNN）中使用GET Gated Realurvent单元（GRU）进行文本表示。 MSCOCO评估服务器用于评估机器生成的给定图像的标题。

著录项

来源
《Journal of intelligent & fuzzy systems: Applications in Engineering and Technology 》 |2019年第5期| 共10页
作者
Dash Sandeep Kumar; Saha Saurav; Pakray Partha; Gelbukh Alexander;
展开▼
作者单位

Natl Inst Technol Mizoram Dept Comp Sci &

Engn Mizoram India;

Natl Inst Technol Mizoram Dept Comp Sci &

Engn Mizoram India;

Natl Inst Technol Silchar Dept Comp Sci &

Engn Silchar Assam India;

Natl Polytech Inst Ctr Comp Res Nat Language Lab Mexico City DF Mexico;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化系统 ;
关键词
Image captioning; convolutional neural network;

机译：图像标题;卷积神经网络;

相似文献

外文文献
中文文献
专利

1. Generating image captions through multimodal embedding [J] . Dash Sandeep Kumar, Saha Saurav, Pakray Partha, Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2019 ,第5期

机译：通过多模式嵌入生成图像标题
2. Novel model to integrate word embeddings and syntactic trees for automatic caption generation from images [J] . Soft computing: A fusion of foundations, methodologies and applications . 2020 ,第2期

机译：从图像中集成Word Embeddings和Syntactic树的小说模型
3. Reference-based model using multimodal gated recurrent units for image captioning [J] . Tiago do Carmo Nogueira, Cassio Dener Noronha Vinhal, Gelson da Cruz Junior, Multimedia Tools and Applications . 2020 ,第41a42期

机译：基于参考的模型，使用多模式门控复发单元进行图像标题
4. A Dataset and Reranking Method for Multimodal MT of User-Generated Image Captions [C] . Shigehiko Schamoni, Julian Hitschler, Stefan Riezler Conference of the Association for Machine Translation in the Americas . 2018

机译：用户生成的图像字幕的多模式MT的数据集和排序方法
5. L1/L2 eye movement reading of closed captioning: A multimodal analysis of multimodal use. [D] . Specker, Elizabeth A. 2008

机译：隐藏式字幕的L1 / L2眼动读数：多模式使用的多模式分析。
6. A novel edge based embedding in medical images based on unique key generated using sudoku puzzle design [O] . B. Santhi, B. Dheeptha -1

机译：基于数独拼图设计生成的唯一键的基于边缘的新颖医学图像嵌入方法
7. Content based Caption Generation for Images Embedded in News Articles [O] . Amitkumar Kohakade, Emmanuel M 2015

机译：基于内容的新闻文章中嵌入图像的标题生成

Generating image captions through multimodal embedding

摘要

著录项

相似文献

相关主题

期刊订阅