...
首页> 外文期刊>Image and Vision Computing >From known to the unknown: Transferring knowledge to answer questions about novel visual and semantic concepts
【24h】

From known to the unknown: Transferring knowledge to answer questions about novel visual and semantic concepts

机译:从未知的知名:转移知识以回答关于新型视觉和语义概念的问题

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Current Visual Question Answering (VQA) systems can answer intelligent questions about 'known' visual content. However, their performance drops significantly when questions about visually and linguistically 'unknown' concepts are presented during inference ('Open-world' scenario). A practical VQA systemshould be able to dealwith novel concepts in real world settings. To address this problem, we propose an exemplar-based approach that transfers learning (i.e., knowledge) from previously 'known' concepts to answer questions about the 'unknown'. We learn a highly discriminative joint embedding (JE) space, where visual and semantic features are fused to give a unified representation. Once novel concepts are presented to the model, it looks for the closest match from an exemplar set in the JE space. This auxiliary information is used alongside the given Image-Question pair to refine visual attention in a hierarchical fashion. Our novel attention model is based on a dual-attention mechanismthat combines the complementary effect of spatial and channel attention. Since handling the high dimensional exemplars on large datasets can be a significant challenge, we introduce an efficientmatching scheme that uses a compact feature description for search and retrieval. To evaluate ourmodel, we propose a newdataset for VQA, separating unknown visual and semantic concepts fromthe training set. Our approach shows significant improvements over state-of-the-art VQA models on the proposed Open-World VQA dataset and other standard VQA datasets. (c) 2020 Elsevier B.V. All rights reserved.
机译:当前的视觉问题应答(VQA)系统可以回答有关“已知”视觉内容的智能问题。然而,当在推理期间在视觉上和语言上的“未知”概念的问题时,它们的性能显着下降('开放世界的情景)。一个实用的VQA System应该能够在真实世界中进行新的概念。为了解决这个问题,我们提出了一种基于示例的方法,该方法从以前的“已知”概念从以前的“已知”概念转交学习(即,知识)以回答有关“未知”的问题。我们学习一个高度辨别的关节嵌入(JE)空间,其中视觉和语义功能被融合以提供统一的代表。一旦提出了新颖的概念,它将寻找与JE空间中的示例集中的最接近的匹配。该辅助信息与给定的图像题配对一起使用以以分层方式改进视觉注意。我们的新型注意力模型基于双重关注机构,结合了空间和渠道注意力的互补效果。由于处理大型数据集上的高维样式可能是一个重大挑战,因此我们介绍了一种有效的匹配方案,该方案使用紧凑的特征描述来搜索和检索。为了评估我们的表模,我们向VQA提出了一个NewDataset,将未知的视觉和语义概念与训练集分开。我们的方法显示出在所提出的开放世界VQA数据集和其他标准VQA数据集上的最先进的VQA模型的显着改进。 (c)2020 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号