Object sequences: encoding categorical and spatial information for a yeso visual question answering task

Shivam Garg; Rajeev Srivastava

首页> 外文期刊>Computer Vision, IET >Object sequences: encoding categorical and spatial information for a yeso visual question answering task

【24h】

Object sequences: encoding categorical and spatial information for a yeso visual question answering task

机译：对象序列：对分类和空间信息进行编码，以执行是/否视觉问题回答任务

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The task of visual question answering (VQA) has gained wide popularity in recent times. Effectively solving the VQA task requires the understanding of both the visual content in the image and the language information associated with the text-based question. In this study, the authors propose a novel method of encoding the visual information (categorical and spatial object information) of all the objects present in the image into a sequential format, which is called an object sequence. These object sequences can then be suitably processed by a neural network. They experiment with multiple techniques for obtaining a joint embedding from the visual features (in the form of object sequences) and language-based features obtained from the question. They also provide a detailed analysis on the performance of a neural network architecture using object sequences, on the Oracle task of GuessWhat dataset (aYes/NoVQA task) and benchmark it against the baseline. 展开▼

机译：视觉问答（VQA）的任务近来已广受欢迎。有效地解决VQA任务需要理解图像中的视觉内容以及与基于文本的问题相关的语言信息。在这项研究中，作者提出了一种新颖的方法，将图像中存在的所有对象的视觉信息（分类和空间对象信息）编码为一种顺序格式，称为对象序列。这些对象序列然后可以由神经网络适当地处理。他们尝试了多种技术，以从视觉特征（以对象序列的形式）和从问题中获得的基于语言的特征中获得联合嵌入。他们还对使用对象序列的神经网络体系结构的性能，GuessWhat数据集的Oracle任务（a n 是 n / n 否 nVQA任务），并根据基准对其进行基准测试。 展开▼

著录项

来源
《Computer Vision, IET》 |2018年第8期|1141-1150|共10页

作者
Shivam Garg; Rajeev Srivastava;
展开▼

作者单位

Department of Computer Science and Engineering, Indian Institute of Technology (BHU), India;

Department of Computer Science and Engineering, Indian Institute of Technology (BHU), India;

展开▼

收录信息

原文格式 PDF

正文语种 eng

中图分类

关键词
image coding; image sequences; neural net architecture; question answering (information retrieval); text analysis;

机译：图像编码;图像序列;神经网络架构;问题解答（信息检索）;文本分析;

相似文献

外文文献

中文文献

专利

1. Improving visual question answering using dropout and enhanced question encoder [J] . Fang Zhiwei, Liu Jing, Li Yong, Pattern Recognition: The Journal of the Pattern Recognition Society . 2019,第期

机译：使用辍学和增强的问题编码器改进视觉问题的回答

2. Question-Led object attention for visual question answering [J] . Gao Lianli, Cao Liangfu, Xu Xing, Neurocomputing . 2020,第May28期

机译：问题LED对象注意视觉问题应答

3. BETTER GENERIC OBJECTS COUNTING WHEN ASKING QUESTIONS TO IMAGES: A MULTITASK APPROACH FOR REMOTE SENSING VISUAL QUESTION ANSWERING [J] . S. Lobry, D. Marcos, B. Kellenberger, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences . 2020,第5期

机译：在向图像提出问题时计算更好的通用对象：遥感视觉问题的多任务方法

4. Twice Opportunity Knocks Syntactic Ambiguity: A Visual Question Answering Model with yeso Feedback [C] . Jianming Wang, Wei Deng, Yukuan Sun, IEEE International Conference on Multimedia and Expo . 2019

机译：两次机会敲响句法歧义：是/否反馈的视觉问题回答模型

5. Attention Correction Mechanisms in Visual Contexts in Visual Question Answering [D] . Sharan, Komal 2018

机译：视觉问答中视觉上下文中的注意力纠正机制

6. Are Categorical Spatial Relations Encoded by Shifting Visual Attention between Objects? [O] . Lei Yuan, David Uttal, Steven Franconeri -1

机译：通过在对象之间转移视觉注意力来编码分类空间关系吗？

7. Are Categorical Spatial Relations Encoded by Shifting Visual Attention between Objects? [O] . Lei Yuan, David Uttal, Steven Franconeri 2016

机译：是否通过在对象之间转移视觉注意来编码分类空间关系？

8. Answering Questions from Oceanography Texts: Learner, Task and Text Characteristics [R] . Goldman, S. R., Duran, R. P. 1987

机译：回答海洋学文本中的问题：学习者，任务和文本特征

1. 时空联系的反应编码形式:视觉性空间编码而非言语性空间编码 [J] . 章邦武1 ,黄希庭1 . 心理学进展 . 2015,第010期

2. 基于剖分编码的空间位置标识与空间对象标识问题研究 [J] . 关丽 . 测绘学报 . 2011,第003期

3. 融合文本序列和图信息的海关商品HS编码分类 [J] . 杜少华 ,万怀宇 ,武志昊 . 计算机科学 . 2021,第004期

4. 基于DNA非编码区序列探讨罗布麻的分类问题 [J] . 张卫明 ,彭雪梅 ,陆长梅 . 西北植物学报 . 2007,第005期

5. 信息分类与编码标准化工作的里程碑全国信息分类与编码标准化技术委员会在北京成立 [J] . 夏爱民 . 标准生活 . 2009,第007期

6. 铁路工务空间信息的分类与编码研究 [C] . 张献州 ,袁淑芳 ,李红玥 . 2005现代工程测量技术发展与应用研讨交流会 . 2005

7. 空间信息在面向对象分类方法中的应用——以IKONOS影像香榧树分布信息提取研究为例 [A] . 韩凝 . 2011

1. 深度问题回答系统中的问题分类和特征映射的方法和系统 [P] . 中国专利： CN103870528B . 2018.04.17

2. 深度问题回答系统中的问题分类和特征映射的方法和系统 [P] . 中国专利： CN103870528A . 2014-06-18

3. color spatial scaling video encoding method, color spatial scaling video decoding method, color spatial scaling video coding method, spatial scalability video decoding method, spatial scaling video encoding method color decoder, color spatial scaling video decoder, color spatial scaling video decoder, single slice data transfer method containing a plurality of macro blocks, method of generating a video sequence, processing method of a video sequence, method of decoding a video sequence, and recording media [P] . 外国专利： BRPI0600635A . 2007-03-13

机译：颜色空间缩放视频编码方法，颜色空间缩放视频解码方法，颜色空间缩放视频编码方法，空间可缩放视频解码方法，空间缩放视频编码方法颜色解码器，颜色空间缩放视频解码器，颜色空间缩放视频解码器，单切片数据包含多个宏块的传输方法，视频序列的生成方法，视频序列的处理方法，视频序列的解码方法以及记录介质

4. Visual summary of answers from natural language question answering systems [P] . 外国专利： US11049027B2 . 2021-06-29

机译：来自自然语言问题应答系统的答案视觉摘要

5. VISUAL SUMMARY OF ANSWERS FROM NATURAL LANGUAGE QUESTION ANSWERING SYSTEMS [P] . 外国专利： US2018225579A1 . 2018-08-09

机译：自然语言问答系统的答案的可视化摘要

相关主题

Object sequences: encoding categorical and spatial information for a yeso visual question answering task

摘要

著录项

相似文献

相关主题

期刊订阅