Optimal Image Feature Ranking and Fusion for Visual Question Answering

机译：最佳图像特征对视觉问题的排名和融合

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Visual Question Answering (VQA) is a moderately new and challenging multi-modal task, which endeavors to discover an answer for a given pair of an image and a relating question. This AI-complete task gains attraction from numerous researchers from the areas computer vision (CV) and natural language processing (NLP) due to its various potential applications. The general flow of VQA algorithms consists of image feature extraction, question feature extraction and joint comprehension of these two to generate an appropriate answer. Existing VQA systems did not pay attention to input feature extraction, but only celebrated different ways of multimodal embedding. This paper proposes to improve the task of VQA by feature-level fusion of visual information. The goal of feature fusion is to consolidate relevant information from two or more feature vectors into a solitary one with additional discriminative power. Unlike simple concatenation, this paper uses discriminative correlation analysis (DCA) for fusion, which is the only method that incorporates the class structure into the feature-level fusion. Since the VQA systems are generally modeled as classification systems by treating the correct answers as classes, class-specific DCA suits well here. The newly created fused feature vectors are close to the right answers and thus raise the role of image understanding in VQA. The experimental results show the effectiveness of the new approach on DAQUAR dataset with mutual information (MI) as an evaluation metric.

机译：视觉问题应答（VQA）是一个适度的新的和具有挑战性的多模态任务，努力发现给定对图像和相关问题的答案。由于其各种潜在应用，这种AI完整的任务从机器视觉（CV）和自然语言处理（NLP）的众多研究人员获得了吸引力。 VQA算法的一般流程包括图像特征提取，问题特征提取和联合理解这两个，以产生适当的答案。现有的VQA系统没有注意输入特征提取，但只庆祝不同的多模式嵌入方式。本文通过视觉信息的特征级融合来提高VQA的任务。特征融合的目标是将两个或多个特征向量的相关信息与额外的辨别力统治到一个单独的特征。与简单的连接不同，本文使用辨别性相关性分析（DCA）进行融合，这是唯一将类结构融入特征级融合的方法。由于VQA系统通常通过将正确的答案视为类，因此特定于类DCA适用于此处。新创建的融合特征向量接近正确的答案，从而提高了图像理解在VQA中的作用。实验结果表明，具有互信息（MI）的大正数据集新方法作为评估度量的有效性。

著录项

来源
《International Conference on Frontiers of Intelligent Computing : Theory and Applications》|2021年|xxi 797 pages :|共11页
会议地点
作者
Sruthy Manmadhan; Binsu C. Kovoor;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 73.918083;
关键词
Convolutional neural networks; Discriminative correlation analysis; Feature extraction; Mutual information; Visual question answering;

机译：卷积神经网络;鉴别相关性分析;特征提取;互相信息;视觉问题应答;

相似文献

外文文献
中文文献
专利

1. Optimal answerer ranking for new questions in community question answering [J] . Zhenlei Yan, Jie Zhou Information Processing & Management . 2015,第1期

机译：社区问答中新问题的最佳回答者排名
2. Multimodal feature fusion by relational reasoning and attention for visual question answering [J] . Zhang Weifeng, Yu Jing, Hu Hua, Information Fusion . 2020,第期

机译：通过关系推理和关注的多模式特征融合
3. Answer extraction and ranking strategies for definitional question answering using linguistic features and definition terminology [J] . Han KS, Song YI, Kim SB, Information Processing & Management . 2007,第2期

机译：使用语言功能和定义术语的定义问题解答的答案提取和排名策略
4. Optimal Image Feature Ranking and Fusion for Visual Question Answering [C] . Sruthy Manmadhan, Binsu C. Kovoor International Conference on Frontiers of Intelligent Computing : Theory and Applications . 2021

机译：最佳图像特征对视觉问题的排名和融合
5. Visual Reasoning and Image Understanding: A Question Answering Approach [D] . Farazi, Md. Moshiur Rahman. 2020

机译：视觉推理和图像理解：一个问题应答方法
6. A dataset of clinically generated visual questions and answers about radiology images [O] . Jason J. Lau, Soumya Gayen, Asma Ben Abacha, 2018

机译：临床产生的有关放射影像的视觉问题和答案的数据集
7. Leveraging Visual Question Answering for Image-Caption Ranking [O] . Lin, Xiao, Parikh, Devi 2016

机译：利用视觉问题回答图像标题排名
8. Questions and Answers on Quality, the ISO 9000 Standard Series, Quality SystemRegistration, and Related Issues. More Questions and Answers on the ISO 9000 Standard Series and Related Issues [R] . Breitenberg, M. 1993

机译：有关质量的问题和解答，IsO 9000标准系列，质量体系注册和相关问题。有关IsO 9000标准系列及相关问题的更多问题和解答

Optimal Image Feature Ranking and Fusion for Visual Question Answering

摘要

著录项

相似文献

相关主题

期刊订阅