Show, Attend to Everything, and Tell: Image Captioning with More Thorough Image Understanding

机译：显示，参加一切，并告诉：具有更彻底的图像理解的图像标题

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Image captioning is one of the most important cross-modal tasks in machine learning. Attention-based encoder-decoder frameworks have been utilized for this task, abundantly. For visual understanding of an image, via the encoder, most of these networks use the last convolutional layer of a network designed for some computer vision tasks. There are several downsides to that. First, these models are specialized to detect certain objects from the image. Thus, when we get deeper into the network, the network focuses on these objects, becoming almost blind to the rest of the image. These blindspots of the encoder sometimes are where the next word in the caption lies. Moreover, many words in the caption are not included in the target classes of these tasks, such as "snow".having this observation in mind, in order to reduce the blind spots of the last convolutional layer of the encoder, we propose a novel method to reuse other convolutional layers of the encoder. Doing so provides us diverse features of the image while not neglecting almost any part of the image and hence, we "attend to everything" in the image. Using the flickr30k [1] dataset, we evaluate our method and demonstrate comparable results with the state-of-the-art, even with simple attention mechanisms.

机译：图像标题是机器学习中最重要的跨模型任务之一。基于关注的编码器 - 解码器框架已被充分利用此任务。为了通过编码器视觉理解图像，大多数这些网络使用设计用于某些计算机视觉任务的网络的最后一个卷积层。有几个缺点。首先，这些模型专门用于从图像中检测某些对象。因此，当我们深入了解网络时，网络侧重于这些对象，几乎对图像的其余部分变得盲目。编码器的这些盲点有时是标题中的下一个单词所在的位置。此外，这些任务的目标类中的许多单词不包括在这些任务的目标类中，例如“雪”。考虑到这一观察，以减少编码器的最后一个卷积层的盲点，我们提出了一部小说重用编码器的其他卷积层的方法。这样做提供了我们的不同特征，同时不会忽略图像的几乎任何部分，因此，我们“参加图像中的一切”。使用Flickr30k [1]数据集，我们评估我们的方法，并表现出与最先进的结果，即使具有简单的注意机制。

著录项

来源
《International Conference on Computer and Knowledge Engineering》|2020年|001-005|共5页
会议地点
作者
Zahra Karimpour; Amirm. Sarfi; Nader Asadi; Fahimeh Ghasemian;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Decoding; Task analysis; Visualization; Neurons; Vocabulary; Machine learning; Computer vision;

机译：解码;任务分析;可视化;神经元;词汇;机器学习;计算机愿景;

相似文献

外文文献
中文文献
专利

1. A neural image captioning model with caption-to-images semantic constructor [J] . Su Jinsong, Tang Jialong, Lu Ziyao, Neurocomputing . 2019,第Nova20期

机译：具有字幕到图像语义构造函数的神经图像字幕模型
2. The Traffic Scene Understanding and Prediction Based on Image Captioning [J] . Wei Li, Zhaowei Qu, Haiyu Song, Quality Control, Transactions . 2021,第1期

机译：基于图像标题的交通场景理解与预测
3. A picture is worth a thousand words: The effect of viewing celebrity Instagram images with disclaimer and body positive captions on women's body image~(z.star;) [J] . Brown Zoe, Tiggemann Marika Body image . 2020,第Juna期

机译：一张图片胜过千言万语：将名人Instagram图像与免责声明和身体阳性标题进行观看名人Instagram图像〜（Z.Star;）
4. Attend to Knowledge: Memory-Enhanced Attention Network for Image Captioning [C] . Hui Chen, Guiguang Ding, Zijia Lin, International conference on brain-inspired cognitive systems . 2018

机译：参加知识：用于图像字幕的内存增强注意力网络
5. Generation of Humorous Caption for Cartoon Images Using Deep Learning [D] . Shanmuga Sundaram, Rajesh. 2018

机译：使用深度学习的卡通形象的幽默标题
6. Caption-based topical descriptors for microscopic images of breast neoplasms as published in academic papers [O] . Sujin Kim, Shannon Lamkin, Pam Duncan -1

机译：中发表的学术论文对乳腺肿瘤的显微图像基于带字幕的局部描述符
7. Attend to You: Personalized Image Captioning with Context Sequence Memory Networks [O] . Park, Cesc Chunseong, Kim, Byeongchang, Kim, Gunhee 2017

机译：参加你：使用上下文序列的个性化图像字幕内存网络

Show, Attend to Everything, and Tell: Image Captioning with More Thorough Image Understanding

摘要

著录项

相似文献

相关主题

期刊订阅