Visual question answering: Datasets, algorithms, and future challenges

Kushal Kafle; Christopher Kanan

首页> 外文期刊>Computer vision and image understanding >Visual question answering: Datasets, algorithms, and future challenges

【24h】

Visual question answering: Datasets, algorithms, and future challenges

机译：视觉问题解答：数据集，算法和未来挑战

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Visual Question Answering (VQA) is a recent problem in computer vision and natural language processing that has garnered a large amount of interest from the deep learning, computer vision, and natural language processing communities. In VQA, an algorithm needs to answer text-based questions about images. Since the release of the first VQA dataset in 2014, additional datasets have been released and many algorithms have been proposed. In this review, we critically examine the current state of VQA in terms of problem formulation, existing datasets, evaluation metrics, and algorithms. In particular, we discuss the limitations of current datasets with regard to their ability to properly train and assess VQA algorithms. We then exhaustively review existing algorithms for VQA. Finally, we discuss possible future directions for VQA and image understanding research.

机译：视觉问答（VQA）是计算机视觉和自然语言处理中的最新问题，已经引起了深度学习，计算机视觉和自然语言处理社区的广泛关注。在VQA中，算法需要回答有关图像的基于文本的问题。自2014年发布第一个VQA数据集以来，已经发布了其他数据集，并提出了许多算法。在本文中，我们从问题制定，现有数据集，评估指标和算法等方面严格审查了VQA的当前状态。特别是，我们讨论了当前数据集在正确训练和评估VQA算法方面的局限性。然后，我们详尽地回顾现有的VQA算法。最后，我们讨论了VQA和图像理解研究的未来可能方向。

著录项

来源
《Computer vision and image understanding》 |2017年第10期|3-20|共18页
作者
Kushal Kafle; Christopher Kanan;
展开▼
作者单位

Chester F. Carlson Center for Imaging Science, Rochester Institute of Technology, Rochester, NY 14623, USA;

Chester F. Carlson Center for Imaging Science, Rochester Institute of Technology, Rochester, NY 14623, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Image understanding; Natural language processing; Vision and language;

机译：形象理解;自然语言处理;视觉与语言;

相似文献

外文文献
中文文献
专利

1. Visual question answering: A survey of methods and datasets [J] . Qi Wu, Damien Teney, Peng Wang, Computer vision and image understanding . 2017,第octa期

机译：视觉问题解答：方法和数据集调查
2. Multiple answers to a question: a new approach for visual question answering [J] . Hosseinabad Sayedshayan Hashemi, Safayani Mehran, Mirzaei Abdolreza The Visual Computer . 2021,第1期

机译：问题的多个答案：一种新的视觉问题接听方法
3. Question-aware prediction with candidate answer recommendation for visual question answering [J] . B. Kim, J. Kim Electronics Letters . 2017,第18期

机译：带有候选答案推荐的问题感知预测，用于视觉问答
4. CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images [C] . Shailaja Keyur Sampat, Akshay Kumar, Yezhou Yang, Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2021

机译：clevr_hyp：挑战数据集和基线，用于在图像上用假设动作回答
5. Attention Correction Mechanisms in Visual Contexts in Visual Question Answering [D] . Sharan, Komal 2018

机译：视觉问答中视觉上下文中的注意力纠正机制
6. A dataset of clinically generated visual questions and answers about radiology images [O] . Jason J. Lau, Soumya Gayen, Asma Ben Abacha, 2018

机译：临床产生的有关放射影像的视觉问题和答案的数据集
7. Visual Question Answering: Datasets, Algorithms, and Future Challenges [O] . Kafle, Kushal, Kanan, Christopher 2017

机译：视觉问题回答：数据集，算法和未来挑战

Visual question answering: Datasets, algorithms, and future challenges

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅