首页> 外文期刊>Computer Vision, IET >Object sequences: encoding categorical and spatial information for a yeso visual question answering task
【24h】

Object sequences: encoding categorical and spatial information for a yeso visual question answering task

机译:对象序列:对分类和空间信息进行编码,以执行是/否视觉问题回答任务

获取原文
获取原文并翻译 | 示例
           

摘要

The task of visual question answering (VQA) has gained wide popularity in recent times. Effectively solving the VQA task requires the understanding of both the visual content in the image and the language information associated with the text-based question. In this study, the authors propose a novel method of encoding the visual information (categorical and spatial object information) of all the objects present in the image into a sequential format, which is called an object sequence. These object sequences can then be suitably processed by a neural network. They experiment with multiple techniques for obtaining a joint embedding from the visual features (in the form of object sequences) and language-based features obtained from the question. They also provide a detailed analysis on the performance of a neural network architecture using object sequences, on the Oracle task of GuessWhat dataset (aYes/NoVQA task) and benchmark it against the baseline.
机译:视觉问答(VQA)的任务近来已广受欢迎。有效地解决VQA任务需要理解图像中的视觉内容以及与基于文本的问题相关的语言信息。在这项研究中,作者提出了一种新颖的方法,将图像中存在的所有对象的视觉信息(分类和空间对象信息)编码为一种顺序格式,称为对象序列。这些对象序列然后可以由神经网络适当地处理。他们尝试了多种技术,以从视觉特征(以对象序列的形式)和从问题中获得的基于语言的特征中获得联合嵌入。他们还对使用对象序列的神经网络体系结构的性能,GuessWhat数据集的Oracle任务(a n n / n nVQA任务),并根据基准对其进行基准测试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号