Aligning Where to See and What to Tell: Image Captioning with Region-Based Attention and Scene-Specific Contexts

Kun Fu; Junqi Jin; Runpeng Cui; Fei Sha; Changshui Zhang

首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Aligning Where to See and What to Tell: Image Captioning with Region-Based Attention and Scene-Specific Contexts

【24h】

Aligning Where to See and What to Tell: Image Captioning with Region-Based Attention and Scene-Specific Contexts

机译：统一在哪里看到和告诉什么：基于区域的注意力和特定于场景的上下文的图像字幕

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Recent progress on automatic generation of image captions has shown that it is possible to describe the most salient information conveyed by images with accurate and meaningful sentences. In this paper, we propose an image captioning system that exploits the parallel structures between images and sentences. In our model, the process of generating the next word, given the previously generated ones, is aligned with the visual perception experience where the attention shifts among the visual regions-such transitions impose a thread of ordering in visual perception. This alignment characterizes the flow of latent meaning, which encodes what is semantically shared by both the visual scene and the text description. Our system also makes another novel modeling contribution by introducing scene-specific contexts that capture higher-level semantic information encoded in an image. The contexts adapt language models for word generation to specific scene types. We benchmark our system and contrast to published results on several popular datasets, using both automatic evaluation metrics and human evaluation. We show that either region-based attention or scene-specific contexts improves systems without those components. Furthermore, combining these two modeling ingredients attains the state-of-the-art performance.

机译：自动生成图像字幕的最新进展表明，可以用准确而有意义的句子描述图像传达的最显着信息。在本文中，我们提出了一种图像字幕系统，该系统利用了图像和句子之间的并行结构。在我们的模型中，在给定先前生成的单词的情况下，生成下一个单词的过程与视觉感知体验保持一致，在视觉体验中，注意力在视觉区域之间转移-这种过渡强加了视觉感知的顺序性。这种对齐方式表征了潜在含义的流，它对视觉场景和文本描述在语义上共享的内容进行编码。我们的系统还通过引入特定于场景的上下文（捕获捕获在图像中编码的高级语义信息）做出了另一种新颖的建模贡献。上下文使用于单词生成的语言模型适应特定的场景类型。我们使用自动评估指标和人工评估来对我们的系统进行基准测试，并与多个流行数据集上的已发布结果进行对比。我们显示，无论是基于区域的注意力还是特定于场景的上下文，都可以改善没有这些组件的系统。此外，将这两种建模要素结合使用可获得最先进的性能。

著录项

来源
《IEEE Transactions on Pattern Analysis and Machine Intelligence》 |2017年第12期|2321-2334|共14页
作者
Kun Fu; Junqi Jin; Runpeng Cui; Fei Sha; Changshui Zhang;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Visualization; Feature extraction; Image classification; Context modeling; Adaptation models; Computational modeling; Data mining;

机译：可视化;特征提取;图像分类;上下文建模;适应模型;计算建模;数据挖掘;

相似文献

外文文献
中文文献
专利

1. Image Captioning Using Region-Based Attention Joint with Time-Varying Attention [J] . Wang Weixuan, Hu Haifeng Neural processing letters . 2019,第1期

机译：使用基于区域的注意力联合时变注意力的图像字幕
2. Image Captioning Using Region-Based Attention Joint with Time-Varying Attention [J] . Wang Weixuan, Hu Haifeng Neural processing letters . 2019,第1期

机译：使用基于区域的注意力关节与时变关节的图像标题
3. Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention [J] . Cornia Marcella, Baraldi Lorenzo, Serra Giuseppe, ACM transactions on multimedia computing communications and applications . 2018,第2期

机译：更加注意显着性：具有显着性和上下文注意的图像字幕
4. Adaptively Aligned Image Captioning via Adaptive Attention Time [C] . Lun Huang, Wenmin Wang, Yaxian Xia, Conference on Neural Information Processing Systems . 2020

机译：通过自适应注意时间自适应地对准图像标题
5. Ultra-Context: Maximizing the Context for Better Image Caption Generation [D] . ?Khare, Ankit 2019

机译：超语境：最大化更佳的图像，字幕生成语境
6. Social Image Captioning: Exploring Visual Attention and User Attention [O] . Leiquan Wang, Xiaoliang Chu, Weishan Zhang, 2018

机译：社交图像字幕：探索视觉注意力和用户注意力
7. Variational Autoencoder-Based Multiple Image Captioning Using a Caption Attention Map [O] . Boeun Kim, Saim Shin, Hyedong Jung 2019

机译：使用标题注意图的基于变化的自动统计器的多个图像标题

Aligning Where to See and What to Tell: Image Captioning with Region-Based Attention and Scene-Specific Contexts

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅