...
【24h】

Generating image captions through multimodal embedding

机译:通过多模式嵌入生成图像标题

获取原文
获取原文并翻译 | 示例

摘要

Caption generation requires best of both Computer Vision and Natural Language Processing. Due to recent improvements in both of them many efficient models have been developed. Automatic Image Captioning can be utilized to provide descriptions of website content or to engender frame-by-frame descriptions of video for the vision-impaired and in many such applications. In this work, a model is described which is utilized to generate novel image captions for a previously unseen image by utilizing a multimodal architecture by amalgamation of a Recurrent Neural Network (RNN) and a Convolutional Neural Network (CNN). The model is trained on Microsoft Common Objects in Context (MSCOCO), an image captioning dataset that aligns captions and images in the same representation space, so that an image is close to its relevant captions in that space and far away from dissimilar captions and dissimilar images. ResNet-50 architecture is used for extracting features from the images and GloVe embeddings are used along with Gated Recurrent Unit (GRU) in Recurrent Neural Network (RNN) for text representation. MSCOCO evaluation server is used for evaluation of the machine generated caption for a given image.
机译:标题一代要求最好的计算机视觉和自然语言处理。由于最近的两个改进了许多有效的模型已经开发出来。可以利用自动图像标题来提供网站内容的描述或者为视频禁止障碍和许多这种应用提供视频的逐帧描述。在这项工作中,描述了一种模型,其利用通过利用经常性神经网络(RNN)和卷积神经网络(CNN)来利用多模码架构来生成先前未经多媒体图像的新颖图像字幕。该模型在上下文(Mscoco)中,在Microsoft常见对象上培训,该图像标题数据集将标题和图像对齐在同一表示空间中,使得图像靠近该空间中的相关标题,远离不同的标题和不同的标题图片。 Reset-50架构用于从图像中提取来自图像的功能,并在复制神经网络(RNN)中使用GET Gated Realurvent单元(GRU)进行文本表示。 MSCOCO评估服务器用于评估机器生成的给定图像的标题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号