首页> 外文期刊>The Visual Computer >Modeling coverage with semantic embedding for image caption generation
【24h】

Modeling coverage with semantic embedding for image caption generation

机译:使用语义嵌入对覆盖进行建模以生成图像标题

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a coverage-based image caption generation model. The attention-based encoder-decoder framework has enhanced state-of-the-art image caption generation by learning where to attend of the visual field. However, there exists a problem that in some cases it ignores past attention information, which tends to lead to over-recognition and under-recognition. To solve this problem, a coverage mechanism is incorporated into attention-based image caption generation. A sequential updated coverage vector is applied to preserve the attention historical information. At each time step, the attention model takes the coverage vector as auxiliary input to focus more on unattended features. Besides, to maintain the semantics of an image, we propose semantic embedding as global guidance to coverage and attention model. With semantic embedding, the attention and coverage mechanisms consider more about features relevant to the semantics of an image. Experiments conducted on the three benchmark datasets, namely Flickr8k, Flickr30k and MSCOCO, demonstrate the effectiveness of our proposed approach. In addition to solve the over-recognition and under-recognition problems, it behaves better on long descriptions.
机译:本文提出了一种基于覆盖率的图像标题生成模型。基于注意力的编码器-解码器框架通过学习视野的出现位置,增强了先进的图像标题生成能力。但是,存在一个问题,在某些情况下,它会忽略过去的注意力信息,这往往会导致过度识别和识别不足。为了解决此问题,将覆盖机制合并到基于注意力的图像标题生成中。应用顺序更新的覆盖向量以保留关注历史信息。在每个时间步上,注意力模型都将覆盖向量作为辅助输入,以将更多注意力集中在无人照看的特征上。此外,为了维护图像的语义,我们提出了语义嵌入作为覆盖和注意力模型的全局指导。通过语义嵌入,注意力和覆盖机制会更多地考虑与图像语义相关的特征。在Flickr8k,Flickr30k和MSCOCO这三个基准数据集上进行的实验证明了我们提出的方法的有效性。除了解决过度识别和识别不足的问题外,它在较长的描述中表现更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号