首页> 外文会议>International Workshop of Physical Agents >Towards Fine-Tuning of VQA Models in Public Datasets
【24h】

Towards Fine-Tuning of VQA Models in Public Datasets

机译:在公共数据集中的VQA模型进行微调

获取原文

摘要

This paper studies the Visual Question Answering (VQA) topic, which combines Computer Vision (CV), Natural Language Processing (NLP) and Knowledge Representation & Reasoning (KR&R) in order to automatically provide natural language responses to questions asked by users over images. A review of the state of the art for this technology is initially carried out. Among the different approaches, we select the model known as Pythia to build upon it, because this approach is one of the most popularized and successful methods in the public VQA Challenge. Recently, an exhaustive breakdown was done to the Pythia code by Facebook AI Research (FAIR). We choose to use this updated framework after confirming that the two implementations had analog characteristics. We introduce the different modules of the FAIR implementation and how to train our model, proposing some improvements regarding the baseline. Different fine-tuned models are trained, obtaining an accuracy of 66.22% in the best case for the test set of the public VQA-v2 dataset. A comparison of the quantitative results for the most important experiments jointly some qualitative results are discussed. This experimentation is performed with the aim of finally applying it to eCommerce and store observation use cases for VQA in further research.
机译:本文研究了视觉问题的回答(VQA)主题,它结合了计算机视觉(CV),自然语言处理(NLP)和知识表示和推理(KR&R),以便自动提供对用户在图像上提出的问题的自然语言响应。最初进行对本技术的最新技术的审查。在不同的方法中,我们选择称为Pythia的模型来构建它,因为这种方法是公共VQA挑战中最普遍和成功的方法之一。最近,通过Facebook AI研究(公平)对Pythia码进行了详尽的崩溃。在确认两种实现具有模拟特征后,我们选择使用此更新的框架。我们介绍了公平实施的不同模块以及如何训练我们的模型,提出关于基线的一些改进。培训不同的微调模型,在Public VQA-V2数据集的测试集中获得66.22%的准确性。讨论了最重要的实验的定量结果的比较,共同一些定性结果。此实验是旨在最终将其应用于进一步研究中的VQA的电子商务和商店观察用例的目的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号