首页> 外文会议>Conference on Neural Information Processing Systems >TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
【24h】

TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines

机译:Tab-VCR:标签和基于Visual Commansense推理基准的标签和属性

获取原文

摘要

Reasoning is an important ability that we learn from a very early age. Yet, reasoning is extremely hard for algorithms. Despite impressive recent progress that has been reported on tasks that necessitate reasoning, such as visual question answering and visual dialog, models often exploit biases in datasets. To develop models with better reasoning abilities, recently, the new visual commonsense reasoning (VCR) task has been introduced. Not only do models have to answer questions, but also do they have to provide a reason for the given answer. The proposed baseline achieved compelling results, leveraging a meticulously designed model composed of LSTM modules and attention nets. Here we show that a much simpler model obtained by ablating and pruning the existing intricate baseline can perform better with half the number of trainable parameters. By associating visual features with attribute information and better text to image grounding, we obtain further improvements for our simpler & effective baseline, TAB-VCR. We show that this approach results in a 5.3%, 4.4% and 6.5% absolute improvement over the previous state-of-the-art [103] on question answering, answer justification and holistic VCR.
机译:推理是我们从一个很小的年龄学到的重要能力。然而,推理对于算法非常困难。尽管最近的进展令人印象深刻,但在需要推理的任务中报告的进展,例如视觉问题应答和视觉对话框,模型通常会在数据集中开发偏见。为了开发具有更好推理能力的模型,最近,已经介绍了新的视觉致辞推理(VCR)任务。模特不仅要回答问题,还要提供给定答案的原因。所提出的基线实现了引人注目的结果,利用由LSTM模块和注意网组成的精心设计的模型。在这里,我们表明,通过烧蚀和修剪现有的复杂基线获得的更简单的模型可以更好地执行培训参数的一半。通过将具有属性信息和更好的文本与图像接地相关联的可视功能,我们可以进一步改进我们的更简单和有效的基线,Tab-VCR。我们表明,在先前的最先进的问题上,这种方法会产生5.3%,4.4%和6.5%的绝对改善[103]问题回答,回答理由和整体录像机。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号