首页> 外文会议>International Conference on Computer and Knowledge Engineering >Show, Attend to Everything, and Tell: Image Captioning with More Thorough Image Understanding
【24h】

Show, Attend to Everything, and Tell: Image Captioning with More Thorough Image Understanding

机译:显示,参加一切,并告诉:具有更彻底的图像理解的图像标题

获取原文

摘要

Image captioning is one of the most important cross-modal tasks in machine learning. Attention-based encoder-decoder frameworks have been utilized for this task, abundantly. For visual understanding of an image, via the encoder, most of these networks use the last convolutional layer of a network designed for some computer vision tasks. There are several downsides to that. First, these models are specialized to detect certain objects from the image. Thus, when we get deeper into the network, the network focuses on these objects, becoming almost blind to the rest of the image. These blindspots of the encoder sometimes are where the next word in the caption lies. Moreover, many words in the caption are not included in the target classes of these tasks, such as "snow".having this observation in mind, in order to reduce the blind spots of the last convolutional layer of the encoder, we propose a novel method to reuse other convolutional layers of the encoder. Doing so provides us diverse features of the image while not neglecting almost any part of the image and hence, we "attend to everything" in the image. Using the flickr30k [1] dataset, we evaluate our method and demonstrate comparable results with the state-of-the-art, even with simple attention mechanisms.
机译:图像标题是机器学习中最重要的跨模型任务之一。基于关注的编码器 - 解码器框架已被充分利用此任务。为了通过编码器视觉理解图像,大多数这些网络使用设计用于某些计算机视觉任务的网络的最后一个卷积层。有几个缺点。首先,这些模型专门用于从图像中检测某些对象。因此,当我们深入了解网络时,网络侧重于这些对象,几乎对图像的其余部分变得盲目。编码器的这些盲点有时是标题中的下一个单词所在的位置。此外,这些任务的目标类中的许多单词不包括在这些任务的目标类中,例如“雪”。考虑到这一观察,以减少编码器的最后一个卷积层的盲点,我们提出了一部小说重用编码器的其他卷积层的方法。这样做提供了我们的不同特征,同时不会忽略图像的几乎任何部分,因此,我们“参加图像中的一切”。使用Flickr30k [1]数据集,我们评估我们的方法,并表现出与最先进的结果,即使具有简单的注意机制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号