Generalized Hadamard-Product Fusion Operators for Visual Question Answering

机译：视觉问题回答的广义Hadamard-乘积融合算子

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a generalized class of multimodal fusion operators for the task of visual question answering (VQA). We identify generalizations of existing multimodal fusion operators based on the Hadamard product, and show that specific non-trivial instantiations of this generalized fusion operator exhibit superior performance in terms of OpenEnded accuracy on the VQA task. In particular, we introduce Nonlinearity Ensembling, Feature Gating, and post-fusion neural network layers as fusion operator components, culminating in an absolute percentage point improvement of 1.1% on the VQA 2.0 test-dev set over baseline fusion operators, which use the same features as input. We use our findings as evidence that our generalized class of fusion operators could lead to the discovery of even superior task-specific operators when used as a search space in an architecture search over fusion operators.

机译：我们为视觉问题回答（VQA）的任务提出了一个通用的多峰融合算子类。我们确定了基于Hadamard产品的现有多模态融合算子的一般化，并表明该广义融合算子的特定非平凡实例在VQA任务的OpenEnded准确性方面表现出优越的性能。特别是，我们引入了非线性集合，特征门控和融合后神经网络层作为融合算子组件，最终使VQA 2.0测试版相对于基线融合算子集提高了1.1％的绝对百分比，后者使用相同的功能作为输入。我们用我们的发现作为证据，证明当在架构搜索中用作融合运算符的搜索空间时，我们广义的融合运算符类别甚至可能导致发现甚至更高的特定于任务的运算符。

著录项

来源
《Conference on Computer and Robot Vision》|2018年|39-46|共8页
会议地点
作者
Brendan Duke; Graham W. Taylor;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Feature extraction; Visualization; Task analysis; Data models; Mathematical model; Natural languages;

机译：特征提取可视化任务分析数据模型数学模型自然语言;

相似文献

外文文献
中文文献
专利

1. Multimodal feature fusion by relational reasoning and attention for visual question answering [J] . Zhang Weifeng, Yu Jing, Hu Hua, Information Fusion . 2020,第期

机译：通过关系推理和关注的多模式特征融合
2. Information fusion in visual question answering: A Survey [J] . Zhang Dongxiang, Cao Rui, Wu Sai Information Fusion . 2019,第期

机译：视觉问题的信息融合应答：调查
3. Multiple answers to a question: a new approach for visual question answering [J] . Hosseinabad Sayedshayan Hashemi, Safayani Mehran, Mirzaei Abdolreza The Visual Computer . 2021,第1期

机译：问题的多个答案：一种新的视觉问题接听方法
4. Generalized Hadamard-Product Fusion Operators for Visual Question Answering [C] . Brendan Duke, Graham W. Taylor Conference on Computer and Robot Vision . 2018

机译：广义Hadamard - 产品融合运营商用于视觉问题
5. Attention Correction Mechanisms in Visual Contexts in Visual Question Answering [D] . Sharan, Komal 2018

机译：视觉问答中视觉上下文中的注意力纠正机制
6. How Should Stressors Be Examined in Teachers? Answering Questions about Dimensionality Generalizability and Predictive Effects Using the Multicontext Stressors Scale [O] . Ángel Abós, Javier Sevil-Serrano, Lisa E. Kim, 2019

机译：如何在教师中检查压力源？使用多上下文压力源量表回答有关维度可概括性和预测效果的问题
7. Generalized Hadamard-Product Fusion Operators for Visual Question Answering [O] . Brendan Duke, Graham W. Taylor 2018

机译：广义Hadamard - 产品融合运营商用于视觉问题
8. Answers to Questions at Public Meetings Regarding Implementation of Title 10, Code of Federal Regulations, Part 55 on Operators' Licenses [R] . Bridges, T. L. 1987

机译：公共会议上关于实施第10章“联邦法规”，第55部分关于运营商许可的问题的答案

Generalized Hadamard-Product Fusion Operators for Visual Question Answering

摘要

著录项

相似文献

相关主题

期刊订阅