首页> 外文会议>IEEE International Conference on Computer Vision >Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
【24h】

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

机译:学习具有深度加强学习的合作视觉对话代理

获取原文

摘要

We introduce the first goal-driven training for visual question answering and dialog agents. Specifically, we pose a cooperative 'image guessing' game between two agents - Q-BOT and A-BOT- who communicate in natural language dialog so that Q-BOT can select an unseen image from a lineup of images. We use deep reinforcement learning (RL) to learn the policies of these agents end-to-end -from pixels to multi-agent multi-round dialog to game reward. We demonstrate two experimental results. First, as a 'sanity check' demonstration of pure RL (from scratch), we show results on a synthetic world, where the agents communicate in ungrounded vocabularies, i.e., symbols with no pre-specified meanings (X, Y, Z). We find that two bots invent their own communication protocol and start using certain symbols to ask/answer about certain visual attributes (shape/color/style). Thus, we demonstrate the emergence of grounded language and communication among 'visual' dialog agents with no human supervision. Second, we conduct large-scale real-image experiments on the VisDial dataset [5], where we pretrain on dialog data with supervised learning (SL) and show that the RL fine-tuned agents significantly outperform supervised pretraining. Interestingly, the RL Q-BOT learns to ask questions that A-BOT is good at, ultimately resulting in more informative dialog and a better team.
机译:我们引入了视觉问答和对话代理的第一个目标为导向的培训。具体来说,我们提出了合作的形象猜测“两个代理之间的博弈 - Q-BOT和A-BOT-谁在自然语言对话交流,使Q-BOT可以从图像的阵容选择一个看不见的图像。我们使用深强化学习(RL),以了解这些代理端至端 - 从像素到多主体多轮对话的策略游戏奖励。我们演示两种实验结果。首先,作为纯RL(从头开始)的“完整性检查”演示中,我们示出了具有不预先指定的含义(X,Y,Z)上的合成的世界的结果,其中药剂在不接地的词汇进行通信,即,符号。我们发现有两个机器人发明自己的通信协议,并开始使用某些符号问/对某些视觉属性(形状/颜色/样式)的答案。因此,我们展示了“可视化”,没有人监督对话框代理人之间接地的语言和沟通的出现。其次,我们在VisDial数据集[5],在那里我们pretrain与监督学习(SL),并表明,RL微调剂显著跑赢监督训练前对话框数据进行大规模实像的实验。有趣的是,RL Q-BOT学会问问题的是A-BOT擅长,最终导致更多的信息对话框,一支更好的球队。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号