首页> 外文会议>Conference on Computer and Robot Vision >Generalized Hadamard-Product Fusion Operators for Visual Question Answering
【24h】

Generalized Hadamard-Product Fusion Operators for Visual Question Answering

机译:视觉问题回答的广义Hadamard-乘积融合算子

获取原文

摘要

We propose a generalized class of multimodal fusion operators for the task of visual question answering (VQA). We identify generalizations of existing multimodal fusion operators based on the Hadamard product, and show that specific non-trivial instantiations of this generalized fusion operator exhibit superior performance in terms of OpenEnded accuracy on the VQA task. In particular, we introduce Nonlinearity Ensembling, Feature Gating, and post-fusion neural network layers as fusion operator components, culminating in an absolute percentage point improvement of 1.1% on the VQA 2.0 test-dev set over baseline fusion operators, which use the same features as input. We use our findings as evidence that our generalized class of fusion operators could lead to the discovery of even superior task-specific operators when used as a search space in an architecture search over fusion operators.
机译:我们为视觉问题回答(VQA)的任务提出了一个通用的多峰融合算子类。我们确定了基于Hadamard产品的现有多模态融合算子的一般化,并表明该广义融合算子的特定非平凡实例在VQA任务的OpenEnded准确性方面表现出优越的性能。特别是,我们引入了非线性集合,特征门控和融合后神经网络层作为融合算子组件,最终使VQA 2.0测试版相对于基线融合算子集提高了1.1%的绝对百分比,后者使用相同的功能作为输入。我们用我们的发现作为证据,证明当在架构搜索中用作融合运算符的搜索空间时,我们广义的融合运算符类别甚至可能导致发现甚至更高的特定于任务的运算符。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号