Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

机译：提示：利用解释使愿景和语言模型更接地

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Many vision and language models suffer from poor visual grounding -- often falling back on easy-to-learn language priors rather than basing their decisions on visual concepts in the image. In this work, we propose a generic approach called Human Importance-aware Network Tuning (HINT) that effectively leverages human demonstrations to improve visual grounding. HINT encourages deep networks to be sensitive to the same input regions as humans. Our approach optimizes the alignment between human attention maps and gradient-based network importances -- ensuring that models learn not just to look at but rather rely on visual concepts that humans found relevant for a task when making predictions. We apply HINT to Visual Question Answering and Image Captioning tasks, outperforming top approaches on splits that penalize over-reliance on language priors (VQA-CP and robust captioning) using human attention demonstrations for just 6% of the training data.

机译：许多愿景和语言模型遭受了糟糕的视觉接地 - 通常会落在易于学习的语言前沿，而不是基于他们在图像中的视觉概念上的决定。在这项工作中，我们提出了一种称为人类重要知识的网络调谐（提示）的通用方法，有效利用人类演示来提高视觉接地。提示鼓励深网络对与人类相同的输入区域敏感。我们的方法优化了人类注意地图和基于梯度的网络重要性的对齐 - 确保模型不仅仅是看看，而是依赖于在进行预测时对任务相关的视觉概念。我们将提示应用于视觉问题的回答和图像标题任务，表现出惩罚对语言前驱（VQA-CP和强大的标题）的分裂的顶级方法，使用人类注意力示范仅为培训数据的6％。

著录项

来源
《International Conference on Computer Vision》|2019年|1 v.|共10页
会议地点
作者
Ramprasaath Ramasamy Selvaraju; Stefan Lee; Yilin Shen; Hongxia Jin; Shalini Ghosh; Larry Heck; Dhruv Batra; Devi Parikh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41;
关键词
Visualization; Grounding; Task analysis; Proposals; Training; Tuning; Correlation;

机译：可视化;接地;任务分析;建议;培训;优化;相关;

相似文献

外文文献
中文文献
专利

1. Leveraging product line engineering for the development of domain-specific metamodeling languages [J] . Journal of Visual Languages & Computing . 2019,第Apra期

机译：利用产品线工程开发领域特定的元建模语言
2. Extending Languages by Leveraging Compilers: From Modelica to Optimica [J] . Hedin Gorel, Akesson Johan, Ekman Torbjorn Software, IEEE . 2011,第3期

机译：通过利用编译器扩展语言：从Modelica到Optimica
3. Extending Languages by Leveraging Compilers: From Modelica to Optimica [J] . Hedin Gorel, Akesson Johan, Ekman Torbjorn IEEE Software . 2011,第3期

机译：通过利用编译器扩展语言：从Modelica到Optimica
4. Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded [C] . Ramprasaath Ramasamy Selvaraju, Stefan Lee, Yilin Shen, International Conference on Computer Vision . 2019

机译：提示：利用解释使视觉和语言模型更加扎根
5. Leveraging Model Flexibility and Deep Structure: Non-parametric and Deep Models for Computer Vision Processes with Applications to Deep Model Compression [D] . Rhodes, Anthony D. 2020

机译：利用模型灵活性和深度结构：计算机视觉过程的非参数和深模型，具有深入模型压缩的应用程序
6. Leveraging the power of partnerships: spreading the vision for a population health care delivery model in western Kenya [O] . Tim Mercer, Adrian Gardner, Benjamin Andama, 2018

机译：利用伙伴关系的力量：在肯尼亚西部传播有关人口保健服务模式的愿景
7. Landscape gardening. A collection of plans illustrating the improvement of home grounds, town lots, real estate subdivisions, public squares, cemeteries, with copious explanations. By Elias A. Long. [O] . Elias A. Long 1891

机译：景观园艺。有一系列计划，说明了家庭场地，城镇批次，房地产细分，公共广场，墓地的改善，具有丰富的解释。通过elias a。长。
8. Leveraging Small-Lexicon Language Models. [R] . Doug, C. 2016

机译：利用小词典语言模型。

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅