首页> 外文会议>International Conference on Computer Vision >Making History Matter: History-Advantage Sequence Training for Visual Dialog
【24h】

Making History Matter: History-Advantage Sequence Training for Visual Dialog

机译:制作历史性问题:Visual-Advantage序列培训进行视觉对话框

获取原文

摘要

We study the multi-round response generation in visual dialog, where a response is generated according to a visually grounded conversational history. Given a triplet: an image, Q&A history, and current question, all the prevailing methods follow a codec (i.e., encoder-decoder) fashion in a supervised learning paradigm: a multimodal encoder encodes the triplet into a feature vector, which is then fed into the decoder for the current answer generation, supervised by the ground-truth. However, this conventional supervised learning does NOT take into account the impact of imperfect history, violating the conversational nature of visual dialog and thus making the codec more inclined to learn history bias but not contextual reasoning. To this end, inspired by the actor-critic policy gradient in reinforcement learning, we propose a novel training paradigm called History Advantage Sequence Training (HAST). Specifically, we intentionally impose wrong answers in the history, obtaining an adverse critic, and see how the historic error impacts the codec’s future behavior by History Advantage — a quantity obtained by subtracting the adverse critic from the gold reward of ground-truth history. Moreover, to make the codec more sensitive to the history, we propose a novel attention network called History-Aware Co-Attention Network (HACAN) which can be effectively trained by using HAST. Experimental results on three benchmarks: VisDial v0.9&v1.0 and GuessWhat?!, show that the proposed HAST strategy consistently outperforms the state-of-the-art supervised counterparts.
机译:我们研究了视觉对话框中的多轮响应生成,其中根据视觉接地的会话历史生成响应。给定三态:图像,问答历史和当前问题,所有现行方法在监督的学习范例中遵循编解码器(即,编码器 - 解码器)时尚:多模式编码器将三联网编码为特征向量,然后送入进入DecodeR的当前答复一代,由地面真理监督。然而,这种传统的监督学习没有考虑到不完美历史的影响,违反了视觉对话的会话性质,从而使编解码器更倾向于学习历史偏见而不是上下文推理。为此,由演员评论家政策梯度在加强学习中启发,我们提出了一种名为历史优势序列训练(Hast)的新型培训范式。具体而言,我们故意在历史上施加错误的答案,获得一个不利的批评者,并了解历史错误如何通过历史优势对Codec的未来行为产生如何 - 通过从地面真理历史的黄金奖励中减去不利批评的数量。此外,为了使编解码器对历史更敏感,我们提出了一种名为历史知识的共同关注网络(HACAN)的新颖关注网络,可以通过使用Hast来有效地训练。三个基准试验结果:viddial v0.9&v1.0和猜测?!,表明拟议的机械战略始终如一地优于最先进的受监管的同行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号