首页> 外文会议>International Conference on Frontiers of Intelligent Computing : Theory and Applications >Optimal Image Feature Ranking and Fusion for Visual Question Answering
【24h】

Optimal Image Feature Ranking and Fusion for Visual Question Answering

机译:最佳图像特征对视觉问题的排名和融合

获取原文

摘要

Visual Question Answering (VQA) is a moderately new and challenging multi-modal task, which endeavors to discover an answer for a given pair of an image and a relating question. This AI-complete task gains attraction from numerous researchers from the areas computer vision (CV) and natural language processing (NLP) due to its various potential applications. The general flow of VQA algorithms consists of image feature extraction, question feature extraction and joint comprehension of these two to generate an appropriate answer. Existing VQA systems did not pay attention to input feature extraction, but only celebrated different ways of multimodal embedding. This paper proposes to improve the task of VQA by feature-level fusion of visual information. The goal of feature fusion is to consolidate relevant information from two or more feature vectors into a solitary one with additional discriminative power. Unlike simple concatenation, this paper uses discriminative correlation analysis (DCA) for fusion, which is the only method that incorporates the class structure into the feature-level fusion. Since the VQA systems are generally modeled as classification systems by treating the correct answers as classes, class-specific DCA suits well here. The newly created fused feature vectors are close to the right answers and thus raise the role of image understanding in VQA. The experimental results show the effectiveness of the new approach on DAQUAR dataset with mutual information (MI) as an evaluation metric.
机译:视觉问题应答(VQA)是一个适度的新的和具有挑战性的多模态任务,努力发现给定对图像和相关问题的答案。由于其各种潜在应用,这种AI完整的任务从机器视觉(CV)和自然语言处理(NLP)的众多研究人员获得了吸引力。 VQA算法的一般流程包括图像特征提取,问题特征提取和联合理解这两个,以产生适当的答案。现有的VQA系统没有注意输入特征提取,但只庆祝不同的多模式嵌入方式。本文通过视觉信息的特征级融合来提高VQA的任务。特征融合的目标是将两个或多个特征向量的相关信息与额外的辨别力统治到一个单独的特征。与简单的连接不同,本文使用辨别性相关性分析(DCA)进行融合,这是唯一将类结构融入特征级融合的方法。由于VQA系统通常通过将正确的答案视为类,因此特定于类DCA适用于此处。新创建的融合特征向量接近正确的答案,从而提高了图像理解在VQA中的作用。实验结果表明,具有互信息(MI)的大正数据集新方法作为评估度量的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号