Deep Visual-Semantic Alignments for Generating Image Descriptions

Andrej Karpathy; Li Fei-Fei

首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Deep Visual-Semantic Alignments for Generating Image Descriptions

【24h】

Deep Visual-Semantic Alignments for Generating Image Descriptions

机译：用于生成图像描述的深度视觉语义对齐

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We present a model that generates natural language descriptions of images and their regions. Our approach leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between language and visual data. Our alignment model is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks (RNN) over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Multimodal Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate that our alignment model produces state of the art results in retrieval experiments on Flickr8K, Flickr30K and MSCOCO datasets. We then show that the generated descriptions outperform retrieval baselines on both full images and on a new dataset of region-level annotations. Finally, we conduct large-scale analysis of our RNN language model on the Visual Genome dataset of 4.1 million captions and highlight the differences between image and region-level caption statistics.

机译：我们提出了一个模型，该模型生成图像及其区域的自然语言描述。我们的方法利用图像数据集及其句子描述来了解语言和视觉数据之间的模态对应关系。我们的对齐模型基于图像区域上的卷积神经网络，句子上的双向递归神经网络（RNN）和结构化目标的新颖组合，该结构化目标通过多模态嵌入来对齐两种模态。然后，我们描述一种多模态递归神经网络体系结构，该体系结构使用推断的比对来学习生成图像区域的新颖描述。我们证明，在Flickr8K，Flickr30K和MSCOCO数据集的检索实验中，我们的对齐模型可产生最先进的结果。然后，我们显示生成的描述在完整图像和区域级注释的新数据集上都优于检索基准。最后，我们在410万字幕的Visual Genome数据集上对RNN语言模型进行了大规模分析，并强调了图像和区域级字幕统计之间的差异。

著录项

来源
《IEEE Transactions on Pattern Analysis and Machine Intelligence》 |2017年第4期|664-676|共13页
作者
Andrej Karpathy; Li Fei-Fei;
展开▼
作者单位

Computer Science Department, Stanford University, Stanford, CA;

Computer Science Department, Stanford University, Stanford, CA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Visualization; Recurrent neural networks; Context; Image segmentation; Analytical models; Natural languages;

机译：可视化;递归神经网络;上下文;图像分割;分析模型;自然语言;

相似文献

外文文献
中文文献
专利

1. Editorial: IMAVIS special issue on deep cross-media neural model for generating image descriptions [J] . Zhang Zhao, Li Sheng, Wang Meng, Image and Vision Computing . 2021,第Auga期

机译：社论：IMAVIS关于生成图像描述的深度跨媒神经模型的特殊问题
2. User-generated descriptions of individual images versus labels of groups of images: A comparison using basic level theory [J] . Abebe Rorissa Information Processing & Management . 2008,第5期

机译：用户生成的单个图像与图像组标签的描述：使用基本层次理论的比较
3. Alignment and Parallelism for the Description of High-Resolution Remote Sensing Images [J] . Vanegas M.C., Bloch I., Inglada J. IEEE Transactions on Geoscience and Remote Sensing. . 2013,第6aPart2期

机译：高分辨率遥感影像描述的对齐与平行度
4. Deep visual-semantic alignments for generating image descriptions [C] . Karpathy Andrej, Li Fei-Fei IEEE Conference on Computer Vision and Pattern Recognition . 2015

机译：深度视觉语义对齐，用于生成图像描述
5. Image Description using Deep Neural Networks. [D] . Oruganti, Ram Manohar. 2016

机译：使用深度神经网络的图像描述。
6. Optimization of a Deep-Learning Method Based on the Classification of Images Generated by Parameterized Deep Snap a Novel Molecular-Image-Input Technique for Quantitative Structure–Activity Relationship (QSAR) Analysis [O] . Yasunari Matsuzaka, Yoshihiro Uesawa 2019

机译：基于参数化深度捕捉生成图像分类的深度学习方法的优化-一种定量构效关系（QSAR）分析的分子图像输入新技术
7. Deep Visual-Semantic Alignments for Generating Image Descriptions [O] . Karpathy, Andrej, Fei-Fei, Li 2015

机译：用于生成图像描述的深层视觉语义对齐
8. Achromatic shearing phase sensor for generating images indicative of measure(s) of alignment between segments of a segmented telescope's mirrors [R] . 2006

机译：消色差剪切相位传感器，用于产生图像，指示分段望远镜镜子的各段之间的对准测量值

Deep Visual-Semantic Alignments for Generating Image Descriptions

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅