Towards Fine-Tuning of VQA Models in Public Datasets

机译：在公共数据集中的VQA模型进行微调

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper studies the Visual Question Answering (VQA) topic, which combines Computer Vision (CV), Natural Language Processing (NLP) and Knowledge Representation & Reasoning (KR&R) in order to automatically provide natural language responses to questions asked by users over images. A review of the state of the art for this technology is initially carried out. Among the different approaches, we select the model known as Pythia to build upon it, because this approach is one of the most popularized and successful methods in the public VQA Challenge. Recently, an exhaustive breakdown was done to the Pythia code by Facebook AI Research (FAIR). We choose to use this updated framework after confirming that the two implementations had analog characteristics. We introduce the different modules of the FAIR implementation and how to train our model, proposing some improvements regarding the baseline. Different fine-tuned models are trained, obtaining an accuracy of 66.22% in the best case for the test set of the public VQA-v2 dataset. A comparison of the quantitative results for the most important experiments jointly some qualitative results are discussed. This experimentation is performed with the aim of finally applying it to eCommerce and store observation use cases for VQA in further research.

机译：本文研究了视觉问题的回答（VQA）主题，它结合了计算机视觉（CV），自然语言处理（NLP）和知识表示和推理（KR＆R），以便自动提供对用户在图像上提出的问题的自然语言响应。最初进行对本技术的最新技术的审查。在不同的方法中，我们选择称为Pythia的模型来构建它，因为这种方法是公共VQA挑战中最普遍和成功的方法之一。最近，通过Facebook AI研究（公平）对Pythia码进行了详尽的崩溃。在确认两种实现具有模拟特征后，我们选择使用此更新的框架。我们介绍了公平实施的不同模块以及如何训练我们的模型，提出关于基线的一些改进。培训不同的微调模型，在Public VQA-V2数据集的测试集中获得66.22％的准确性。讨论了最重要的实验的定量结果的比较，共同一些定性结果。此实验是旨在最终将其应用于进一步研究中的VQA的电子商务和商店观察用例的目的。

著录项

来源
《International Workshop of Physical Agents》|2021年|xi 362 pages :|共18页
会议地点
作者
Miguel E. Ortiz; Luis M. Bergasa; Roberto Arroyo; Sergio álvarez; Aitor Aller;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 63.42083;
关键词
Computer Vision; Natural Language Processing; Knowledge Representation amp; Reasoning; Visual Question Answering; Artificial Intelligence;

机译：计算机愿景;自然语言处理;知识代表＆amp;推理;视觉问题应答;人工智能;

相似文献

外文文献
中文文献
专利

1. Improving relevant subjective testing for validation: Comparing machine learning algorithms for finding similarities in VQA datasets using objective measures [J] . Aldahdooh Ahmed, Masala Enrico, Van Wallendael Glenn, Signal Processing. Image Communication: A Publication of the the European Association for Signal Processing . 2019,第期

机译：提高验证相关主观测试：使用客观措施将机器学习算法与VQA数据集中的相似性进行比较
2. An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling [J] . SAR and QSAR in Environmental Research . 2016,第10a12期

机译：解决QSAR建模中使用的公共数据集中的化学错误和不一致的自动管理程序
3. Validating New Tuberculosis Computational Models with Public Whole Cell Screening Aerobic Activity Datasets [J] . Sean Ekins, Joel S. Freundlich Pharmaceutical Research . 2011,第8期

机译：用公共全细胞筛选有氧活动数据集验证新的结核病计算模型
4. Towards Fine-Tuning of VQA Models in Public Datasets [C] . Miguel E. Ortiz, Luis M. Bergasa, Roberto Arroyo, International Workshop of Physical Agents . 2021

机译：在公共数据集中的VQA模型进行微调
5. Determinants of Presidential Longevity in Higher Education: Estimating a Structural Model from a Dataset Derived from Publicly Available Data [D] . Reid, Aileen Marea. 2018

机译：高等教育总统寿命的决定因素：估计来自公开数据的数据集的结构模型
6. Mining Public Datasets for Modeling Intra-City PM2.5 Concentrations at a Fine Spatial Resolution [O] . Yijun Lin, Dimitrios Stripelis, Yao-Yi Chiang, -1

机译：挖掘公共数据集以精细的空间分辨率为城市内PM2.5浓度建模
7. SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions [O] . Ramprasaath R. Selvaraju, Purva Tendulkar, Devi Parikh, 2020

机译：在VQA模型上眯起市：内部问题的内部VQA模型

Towards Fine-Tuning of VQA Models in Public Datasets

摘要

著录项

相似文献

相关主题

期刊订阅