FLIPDIAL: A Generative Model for Two-Way Visual Dialogue

机译：FLIPDIAL：双向视觉对话的生成模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present FLIPDIAL, a generative model for Visual Dialogue that simultaneously plays the role of both participants in a visually-grounded dialogue. Given context in the form of an image and an associated caption summarising the contents of the image, FLIPDIAL learns both to answer questions and put forward questions, capable of generating entire sequences of dialogue (question-answer pairs) which are diverse and relevant to the image. To do this, FLIPDIAL relies on a simple but surprisingly powerful idea: it uses convolutional neural networks (CNNs) to encode entire dialogues directly, implicitly capturing dialogue context, and conditional VAEs to learn the generative model, FLIPDIAL outperforms the state-of-the-art model in the sequential answering task (1VD) on the VisDial dataset by 5 points in Mean Rank using the generated answers. We are the first to extend this paradigm to full two-way visual dialogue (2VD), where our model is capable of generating both questions and answers in sequence based on a visual input, for which we propose a set of novel evaluation measures and metrics.

机译：我们介绍了FLIPDIAL，这是一种可视对话的生成模型，它同时在可视化对话中扮演两个参与者的角色。给定图像形式的上下文和概述图像内容的相关标题，FLIPDIAL既学习回答问题，又提出问题，能够生成对话的各种序列（问题－答案对），这些对话对用户而言是多样且相关的。图片。为此，FLIPDIAL依靠一个简单但令人惊讶的强大想法：它使用卷积神经网络（CNN）直接对整个对话进行编码，隐式捕获对话上下文，并使用条件VAE来学习生成模型，FLIPDIAL的表现优于当前状态VisDial数据集上按顺序回答任务（1VD）的最先进模型，使用生成的答案按平均等级5分。我们是第一个将此范例扩展到完全双向视觉对话（2VD）的国家，我们的模型能够根据视觉输入顺序生成问题和答案，为此我们提出了一套新颖的评估措施和指标。

著录项

来源
《IEEE/CVF Conference on Computer Vision and Pattern Recognition》|2018年|6097-6105|共9页
会议地点 Salt Lake City(US)
作者
Puneet K. Dokania; Philip H.S. Torr; N. Siddharth; Daniela Massiceti;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Visualization; Task analysis; Computational modeling; History; Data models; Pediatrics; Image color analysis;

机译：可视化；任务分析；计算建模；历史;数据模型；儿科;图像色彩分析;
入库时间 2022-08-26 14:35:27

相似文献

外文文献
中文文献
专利

1. A simple generative model of incremental reference resolution for situated dialogue [J] . Casey Kennington, David Schlangen Computer speech and language . 2017,第jana期

机译：用于情景对话的增量参考分辨率的简单生成模型
2. The dialogue model: using a visualized dialogue to create connection and cooperation [J] . Westermann G., Maurer J. European child & adolescent psychiatry . 2015,第Suppla1期

机译：对话模型：使用可视化对话来建立联系与合作
3. Generative Topographic Mapping Approach to Modeling and Chemical Space Visualization of Human Intestinal Transporters [J] . Timur R. Gimadiev, Timur I. Madzhidov, Gilles Marcou, BioNanoscience . 2016,第4期

机译：生成的地形图建模方法和人类肠道转运蛋白的化学空间可视化。
4. FLIPDIAL: A Generative Model for Two-Way Visual Dialogue [C] . Puneet K. Dokania, Philip H.S. Torr, N. Siddharth, IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2018

机译：Flipdial：双向视觉对话的生成模型
5. Zero Shot Learning for Visual Object Recognition with Generative Models [D] . Vyas, Maunil. 2020

机译：用生成模型进行视觉对象识别的零射击学习
6. Expert-Guided Generative Topographical Modeling with Visual to Parametric Interaction [O] . Chao Han, Leanna House, Scotland C. Leman -1

机译：可视化至参数化交互的专家指导的生成地形图建模
7. FLIPDIAL: A Generative Model for Two-Way Visual Dialogue [O] . Puneet K. Dokania, Philip H.S. Torr, N. Siddharth, 2018

机译：Flipdial：双向视觉对话的生成模型

FLIPDIAL: A Generative Model for Two-Way Visual Dialogue

摘要

著录项

相似文献

相关主题

期刊订阅