首页> 外文期刊>Knowledge-Based Systems >Retrieval-enhanced adversarial training with dynamic memory-augmented attention for image paragraph captioning
【24h】

Retrieval-enhanced adversarial training with dynamic memory-augmented attention for image paragraph captioning

机译:测验增强的对抗对抗训练,具有动态内存增强的图像段标题

获取原文
获取原文并翻译 | 示例

摘要

Existing image paragraph captioning methods generate long paragraph captions solely from input images, relying on insufficient information. In this paper, we propose a retrieval-enhanced adversarial training with dynamic memory-augmented attention for image paragraph captioning (RAMP), which makes full use of the R-best retrieved candidate captions to enhance the image paragraph captioning via adversarial training. Concretely, RAMP treats the retrieved captions as reference captions to augment the discriminator during adversarial training, encouraging the image captioning model (generator) to incorporate informative content in retrieved captions into the generated caption. In addition, a retrieval-enhanced dynamic memory-augmented attention network is devised to keep track of the coverage information and attention history along with the update-chain of the decoder state, and therefore avoiding generating repetitive or incomplete image descriptions. Finally, a copying mechanism is applied to select words from the retrieved candidate captions, which are then put into the proper positions of the target caption so as to improve the fluency and informativeness of the generated caption. Extensive experiments on a benchmark dataset (i.e., Stanford) demonstrate that the proposed RAMP model significantly outperforms the state-of-the-art methods across multiple evaluation metrics. For reproducibility, we submit the code and data at https://github.com/anonymous-caption/RAMP. (C) 2020 Elsevier B.V. All rights reserved.
机译:现有图像段标题方法仅从输入图像生成长段标题,依赖于信息不足。在本文中,我们提出了一种通过对图像段标题(斜坡)的动态存储器引起的引起的检索增强的对抗性训练,这使得充分利用R-BETER检索的候选标题来通过对抗训练来增强图像段落标题。具体地,斜坡将检索到的标题视为参考标题,以在对手训练期间增强鉴别器,鼓励图像标题模型(生成器)将信息内容结合到所生成的标题中。此外,设计了一种检索增强的动态存储器增强网络,以跟踪覆盖信息和注意历史以及解码器状态的更新链,因此避免产生重复或不完整的图像描述。最后,应用复制机制来从检索到的候选字幕中选择单词,然后将其放入目标标题的适当位置,以提高所生成的标题的流畅性和信息性。在基准数据集(即,Stanford)上的广泛实验表明,所提出的斜坡模型在多个评估指标上显着优于最先进的方法。为了重现性,我们在https://github.com/anonymous-caption/ramp上提交代码和数据。 (c)2020 Elsevier B.v.保留所有权利。

著录项

  • 来源
    《Knowledge-Based Systems》 |2021年第28期|106730.1-106730.10|共10页
  • 作者单位

    Chinese Acad Sci Shenzhen Inst Adv Technol Shenzhen Key Lab High Performance Data Min Shenzhen Guangdong Peoples R China;

    Chinese Acad Sci Shenzhen Inst Adv Technol Shenzhen Key Lab High Performance Data Min Shenzhen Guangdong Peoples R China;

    Chinese Acad Sci Inst Comp Technol Beijing Peoples R China;

    Sun Yat Sen Univ Sch Intelligent Engn Guangzhou Guangdong Peoples R China;

    Harbin Inst Technol Shenzhen Peoples R China;

    Huazhong Univ Sci & Technol Wuhan Hubei Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Image paragraph captioning; Key-value memory network; Adversarial training;

    机译:图像段标题;键值存储器网络;对抗培训;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号