TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines

机译：Tab-VCR：标签和基于Visual Commansense推理基准的标签和属性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reasoning is an important ability that we learn from a very early age. Yet, reasoning is extremely hard for algorithms. Despite impressive recent progress that has been reported on tasks that necessitate reasoning, such as visual question answering and visual dialog, models often exploit biases in datasets. To develop models with better reasoning abilities, recently, the new visual commonsense reasoning (VCR) task has been introduced. Not only do models have to answer questions, but also do they have to provide a reason for the given answer. The proposed baseline achieved compelling results, leveraging a meticulously designed model composed of LSTM modules and attention nets. Here we show that a much simpler model obtained by ablating and pruning the existing intricate baseline can perform better with half the number of trainable parameters. By associating visual features with attribute information and better text to image grounding, we obtain further improvements for our simpler & effective baseline, TAB-VCR. We show that this approach results in a 5.3%, 4.4% and 6.5% absolute improvement over the previous state-of-the-art [103] on question answering, answer justification and holistic VCR.

机译：推理是我们从一个很小的年龄学到的重要能力。然而，推理对于算法非常困难。尽管最近的进展令人印象深刻，但在需要推理的任务中报告的进展，例如视觉问题应答和视觉对话框，模型通常会在数据集中开发偏见。为了开发具有更好推理能力的模型，最近，已经介绍了新的视觉致辞推理（VCR）任务。模特不仅要回答问题，还要提供给定答案的原因。所提出的基线实现了引人注目的结果，利用由LSTM模块和注意网组成的精心设计的模型。在这里，我们表明，通过烧蚀和修剪现有的复杂基线获得的更简单的模型可以更好地执行培训参数的一半。通过将具有属性信息和更好的文本与图像接地相关联的可视功能，我们可以进一步改进我们的更简单和有效的基线，Tab-VCR。我们表明，在先前的最先进的问题上，这种方法会产生5.3％，4.4％和6.5％的绝对改善[103]问题回答，回答理由和整体录像机。

著录项

来源
《Conference on Neural Information Processing Systems》|2020年|p15071-15901|共14页
会议地点
作者
Jingxiang Lin; Unnat Jain; Alexander G. Schwing;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计量学;
关键词

相似文献

外文文献
中文文献
专利

1. Multi-Level Knowledge Injecting for Visual Commonsense Reasoning [J] . Wen Zhang, Peng Yuxin IEEE Transactions on Circuits and Systems for Video Technology . 2021,第3期

机译：用于视觉致辞推理的多级知识注入
2. Commonsense reasoning and commonsense knowledge in artificial intelligence [J] . Lalit Saxena Computing reviews . 2016,第8期

机译：人工智能中的常识推理和常识知识
3. Commonsense Reasoning and Commonsense Knowledge in Artificial Intelligence [J] . Davis Ernest, Marcus Gary Communications of the ACM . 2015,第9期

机译：人工智能中的常识推理和常识知识
4. TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines [C] . Jingxiang Lin, Unnat Jain, Alexander G. Schwing Conference on Neural Information Processing Systems . 2020

机译：Tab-VCR：标签和基于Visual Commansense推理基准的标签和属性
5. Visual Commonsense Reasoning: Functionality, Physics, Causality, and Utility [D] . Zhu, Yixin. 2018

机译：视觉常识推理：功能，物理，因果关系和效用
6. Similarity measures and attribute selection for case-based reasoning in transcatheter aortic valve implantation [O] . Hélène Feuillâtre, Vincent Auffret, Miguel Castro, 2020

机译：经齿轮管主动脉瓣植入基于案例推理的相似度测量和属性选择
7. Vision–Language–Knowledge Co-Embedding for Visual Commonsense Reasoning [O] . JaeYun Lee, Incheol Kim 2021

机译：视觉语言 - 知识共同嵌入视觉上致辞推理

TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines

摘要

著录项

相似文献

相关主题

期刊订阅