首页> 外文会议>IEEE International Conference on Multimedia and Expo >Twice Opportunity Knocks Syntactic Ambiguity: A Visual Question Answering Model with yes/no Feedback
【24h】

Twice Opportunity Knocks Syntactic Ambiguity: A Visual Question Answering Model with yes/no Feedback

机译:两次机会敲句法模糊:一个视觉问题的回答模型,是/否反馈

获取原文

摘要

Visual Question Answering (VQA) is a joint task that aims to answer questions based on given images. During dialogs between humans, syntactic ambiguity is a common phenomenon and it also could be found in the questions of VQA systems. Generally, the existing methods for VQA utilize one-shot answering frameworks, which will face a great difficulty if syntactic ambiguity occurs in questions. In human dialogs, people often conquer the problem by feeding back questions for confirmation. Inspired by this observation, we propose a novel method to eliminate the syntactic ambiguity in VQA via the user's feedback. We compared our method with the existing methods on two benchmark datasets, CLEVR and CLEVR-CoGenT. We found that the accuracy of our method is close to 100% on the CLEVR dataset. On the CLEVR-CoGenT dataset, our method is also 21% higher than the state-of-the-art method.
机译:视觉问题应答(VQA)是一个联合任务,旨在根据给定的图像回答问题。在人类之间的对话期间,句法歧义是一种常见的现象,它也可以在VQA系统的问题中找到。通常,VQA的现有方法利用单次应答框架,如果在问题中发生句法模糊,则会面临很大困难。在人类对话中,人们常常通过喂回问题来征服问题进行确认。灵感来自这种观察,我们提出了一种新的方法,通过用户的反馈消除VQA中的句法模糊。我们将我们的方法与两个基准数据集,CLEVR和CLEVR-Cogent的现有方法进行了比较。我们发现,我们的方法的准确性在CLEVR数据集中接近100%。在Clevr-Cogent DataSet上,我们的方法也比最先进的方法高出21%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号