首页> 外文会议>Chinese conference on pattern recognition and computer vision >Semantic Reanalysis of Scene Words in Visual Question Answering
【24h】

Semantic Reanalysis of Scene Words in Visual Question Answering

机译:视觉问题应答中场景词语的语义重新分析

获取原文

摘要

Visual Question Answering (VQA) is a joint task that aims to answer questions based on the given images. The correct analysis of multiple album aggregate issues to remain a key issue in the VQA case, especially when answering question from multiple albums, how to correctly understand album images and corresponding question is an urgent problem. Under the influence of multiple photo albums and the presence of scene words in the question, it may lead to understanding the wrong scene and outputting the wrong answer, resulting in a decrease in VQA performance. In order to solve this problem, this paper proposes a new image and sentence similarity matching model, which outputs the correct image representation by learning the semantic concept. Due to the scene word is not an entity, sometimes the information which the model extracted may be incorrect. Therefore, we can try to reanalyse the question in another different way and give the answer by the similarity between the question and the visual-text. Our model was tested on the MemexQA dataset. The experimental results show that our model not only produces meaningful text sentences to prove the correctness of the answer, but also improves the accuracy by nearly 10%.
机译:视觉问题应答(VQA)是一个联合任务,旨在根据给定的图像回答问题。对多个专辑汇总问题的正确分析仍然是VQA案例中的关键问题,特别是在从多个专辑回答问题时,如何正确了解专辑图像和相应的问题是一个紧急问题。在多个相册的影响和问题中存在场景词的存在​​,它可能导致了解错误的场景并输出错误的答案,导致VQA性能下降。为了解决这个问题,本文提出了一种新的图像和句子相似性匹配模型,其通过学习语义概念来输出正确的图像表示。由于场景单词不是实体,有时提取模型的信息可能是不正确的。因此,我们可以尝试以另一种不同的方式重新分析这个问题,并通过问题与视觉文本之间的相似性来答案。我们的模型在MemexQA数据集上进行了测试。实验结果表明,我们的模型不仅会产生有意义的文本句,以证明答案的正确性,但也提高了近10%的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号