首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >FLIPDIAL: A Generative Model for Two-Way Visual Dialogue
【24h】

FLIPDIAL: A Generative Model for Two-Way Visual Dialogue

机译:FLIPDIAL:双向视觉对话的生成模型

获取原文

摘要

We present FLIPDIAL, a generative model for Visual Dialogue that simultaneously plays the role of both participants in a visually-grounded dialogue. Given context in the form of an image and an associated caption summarising the contents of the image, FLIPDIAL learns both to answer questions and put forward questions, capable of generating entire sequences of dialogue (question-answer pairs) which are diverse and relevant to the image. To do this, FLIPDIAL relies on a simple but surprisingly powerful idea: it uses convolutional neural networks (CNNs) to encode entire dialogues directly, implicitly capturing dialogue context, and conditional VAEs to learn the generative model, FLIPDIAL outperforms the state-of-the-art model in the sequential answering task (1VD) on the VisDial dataset by 5 points in Mean Rank using the generated answers. We are the first to extend this paradigm to full two-way visual dialogue (2VD), where our model is capable of generating both questions and answers in sequence based on a visual input, for which we propose a set of novel evaluation measures and metrics.
机译:我们介绍了FLIPDIAL,这是一种可视对话的生成模型,它同时在可视化对话中扮演两个参与者的角色。给定图像形式的上下文和概述图像内容的相关标题,FLIPDIAL既学习回答问题,又提出问题,能够生成对话的各种序列(问题-答案对),这些对话对用户而言是多样且相关的。图片。为此,FLIPDIAL依靠一个简单但令人惊讶的强大想法:它使用卷积神经网络(CNN)直接对整个对话进行编码,隐式捕获对话上下文,并使用条件VAE来学习生成模型,FLIPDIAL的表现优于当前状态VisDial数据集上按顺序回答任务(1VD)的最先进模型,使用生成的答案按平均等级5分。我们是第一个将此范例扩展到完全双向视觉对话(2VD)的国家,我们的模型能够根据视觉输入顺序生成问题和答案,为此我们提出了一套新颖的评估措施和指标。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号