首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Learning Answer Embeddings for Visual Question Answering
【24h】

Learning Answer Embeddings for Visual Question Answering

机译:学习视觉视觉答案的答案嵌入

获取原文

摘要

We propose a novel probabilistic model for visual question answering (Visual QA). The key idea is to infer two sets of embeddings: one for the image and the question jointly and the other for the answers. The learning objective is to learn the best parameterization of those embeddings such that the correct answer has higher likelihood among all possible answers. In contrast to several existing approaches of treating Visual QA as multi-way classification, the proposed approach takes the semantic relationships (as characterized by the embeddings) among answers into consideration, instead of viewing them as independent ordinal numbers. Thus, the learned embedded function can be used to embed unseen answers (in the training dataset). These properties make the approach particularly appealing for transfer learning for open-ended Visual QA, where the source dataset on which the model is learned has limited overlapping with the target dataset in the space of answers. We have also developed large-scale optimization techniques for applying the model to datasets with a large number of answers, where the challenge is to properly normalize the proposed probabilistic models. We validate our approach on several Visual QA datasets and investigate its utility for transferring models across datasets. The empirical results have shown that the approach performs well not only on in-domain learning but also on transfer learning.
机译:我们提出了一种新颖的概率模型,用于视觉提问(Visual QA)。关键思想是推断出两组嵌入:一组用于图像和问题,另一组用于答案。学习目标是学习这些嵌入的最佳参数化,以便正确答案在所有可能答案中具有更高的可能性。与将Visual QA视为多向分类的几种现有方法相比,该方法考虑了答案之间的语义关系(以嵌入为特征),而不是将它们视为独立的序数。因此,学习的嵌入函数可以用于嵌入看不见的答案(在训练数据集中)。这些特性使该方法对于开放式Visual QA的转移学习特别有吸引力,在开放式Visual QA中,在其上学习模型的源数据集与目标数据集在答案空间内的重叠非常有限。我们还开发了大规模优化技术,以将该模型应用于具有大量答案的数据集,其中的挑战是如何正确归一化建议的概率模型。我们在多个Visual QA数据集上验证了我们的方法,并研究了其在数据集之间转移模型的实用性。实证结果表明,该方法不仅在域内学习方面表现出色,而且在转移学习方面也表现出色。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号