OCR-VQA: Visual Question Answering by Reading Text in Images

机译：OCR-VQA：通过阅读图像中的文本来进行视觉提问

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The problem of answering questions about an image is popularly known as visual question answering (or VQA in short). It is a well-established problem in computer vision. However, none of the VQA methods currently utilize the text often present in the image. These "texts in images" provide additional useful cues and facilitate better understanding of the visual content. In this paper, we introduce a novel task of visual question answering by reading text in images, i.e., by optical character recognition or OCR. We refer to this problem as OCR-VQA. To facilitate a systematic way of studying this new problem, we introduce a large-scale dataset, namely OCRVQA-200K. This dataset comprises of 207,572 images of book covers and contains more than 1 million question-answer pairs about these images. We judiciously combine well-established techniques from OCR and VQA domains to present a novel baseline for OCR-VQA-200K. The experimental results and rigorous analysis demonstrate various challenges present in this dataset leaving ample scope for the future research. We are optimistic that this new task along with compiled dataset will open-up many exciting research avenues both for the document image analysis and the VQA communities.

机译：回答有关图像的问题通常被称为视觉问题解答（或简称VQA）。这是计算机视觉中公认的问题。但是，目前没有一种VQA方法利用图像中经常出现的文本。这些“图像中的文本”提供了其他有用的提示，并有助于更好地理解视觉内容。在本文中，我们通过阅读图像中的文本（即通过光学字符识别或OCR）介绍了视觉问答的新任务。我们将此问题称为OCR-VQA。为了便于系统地研究此新问题，我们引入了一个大型数据集OCRVQA-200K。该数据集包含207,572张书的封面图像，并包含有关这些图像的超过一百万个问题-答案对。我们明智地结合了来自OCR和VQA域的成熟技术，以提出OCR-VQA-200K的新基准。实验结果和严格的分析证明了该数据集中存在的各种挑战，为将来的研究留下了广阔的空间。我们乐观地认为，这项新任务以及已编译的数据集将为文档图像分析和VQA社区打开许多激动人心的研究途径。

著录项

来源
《International Conference on Document Analysis and Recognition》|2019年|947-952|共6页
会议地点
作者
Anand Mishra; Shashank Shekhar; Ajeet Kumar Singh; Anirban Chakraborty;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Optical character recognition software; Visualization; Task analysis; Knowledge discovery; Text analysis; Text recognition; Character recognition;

机译：光学字符识别软件可视化任务分析知识发现文本分析文本识别字符识别;

相似文献

外文文献
中文文献
专利

1. BETTER GENERIC OBJECTS COUNTING WHEN ASKING QUESTIONS TO IMAGES: A MULTITASK APPROACH FOR REMOTE SENSING VISUAL QUESTION ANSWERING [J] . S. Lobry, D. Marcos, B. Kellenberger, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences . 2020,第5期

机译：在向图像提出问题时计算更好的通用对象：遥感视觉问题的多任务方法
2. Focal Visual-Text Attention for Memex Question Answering [J] . Liang Junwei, Jiang Lu, Cao Liangliang, IEEE Transactions on Pattern Analysis and Machine Intelligence . 2019,第8期

机译：Memex问题解答的焦点视觉文本注意
3. A Question-Centric Model for Visual Question Answering in Medical Imaging [J] . Vu Minh H., Lofstedt Tommy, Nyholm Tufve, IEEE Transactions on Medical Imaging . 2020,第9期

机译：医学成像中的视觉问题的质疑为中心模型
4. OCR-VQA: Visual Question Answering by Reading Text in Images [C] . Anand Mishra, Shashank Shekhar, Ajeet Kumar Singh, International Conference on Document Analysis and Recognition . 2019

机译：OCR-VQA：通过在图像中读取文本来应答的视觉问题
5. Visual Reasoning and Image Understanding: A Question Answering Approach [D] . Farazi, Md. Moshiur Rahman. 2020

机译：视觉推理和图像理解：一个问题应答方法
6. Towards Answering Biological Questions with Experimental Evidence: Automatically Identifying Text that Summarize Image Content in Full-Text Articles [O] . Hong Yu 2006

机译：尝试用实验证据回答生物学问题：自动识别全文文章中包含图像内容的文本
7. Focal Visual-Text Attention for Visual Question Answering [O] . Junwei Liang, Lu Jiang, Liangliang Cao, 2018

机译：针对视觉问题的关注关注
8. Answering Questions from Oceanography Texts: Learner, Task and Text Characteristics [R] . Goldman, S. R., Duran, R. P. 1987

机译：回答海洋学文本中的问题：学习者，任务和文本特征

OCR-VQA: Visual Question Answering by Reading Text in Images

摘要

著录项

相似文献

相关主题

期刊订阅