Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation

机译：你是不是在寻找？视觉和语言导航中的多种模式接地

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Vision-and-Language Navigation (VLN) requires grounding instructions, such as turn right and stop at the door, to routes in a visual environment. The actual grounding can connect language to the environment through multiple modalities, e.g. stop at the door might ground into visual objects, while turn right might rely only on the geometric structure of a route. We investigate where the natural language empirically grounds under two recent state-of-the-art VLN models. Surprisingly, we discover that visual features may actually hurt these models: models which only use route structure, ablating visual features, outperform their visual counterparts in unseen new environments on the benchmark Room-to-Room dataset. To better use all the available modalities, we propose to decompose the grounding procedure into a set of expert models with access to different modalities (including object detections) and ensemble them at prediction time, improving the performance of state-of-the-art models on the VLN task.

机译：视觉和语言导航（VLN）需要接地指示，例如向右转并在门口停下，以在视觉环境中到达路线。实际的接地可以通过多种方式将语言连接到环境。停在门口可能会变成视觉对象，而右转可能仅取决于路线的几何结构。我们在两种最新的最先进的VLN模型下调查自然语言在何处经验依据。出乎意料的是，我们发现视觉特征实际上可能会损害这些模型：在基准的“房间到房间”数据集上看不见的新环境中，仅使用路线结构，消融视觉特征的模型要优于其视觉对应模型。为了更好地利用所有可用的模态，我们建议将接地过程分解为一组专家模型，该模型可以访问不同的模态（包括对象检测），并在预测时将它们集成在一起，从而改善最新模型的性能在VLN任务上。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|6551-6557|共7页
会议地点
作者
Ronghang Hu; Daniel Fried; Anna Rohrbach; Dan Klein; Trevor Darrell; Kate Saenko;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Developmental Designs and Adult Functions of Cortical Maps in Multiple Modalities: Perception, Attention, Navigation, Numbers, Streaming, Speech, and Cognition [J] . Stephen Grossberg Frontiers in Neuroinformatics . 2020,第4期

机译：多种方式中皮质地图的发育设计和成人功能：感知，关注，导航，数字，流媒体，语音和认知
2. A hybrid accident analysis method to assess potential navigational contingencies: The case of ship grounding [J] . Akyuz Emre Safety science . 2015,第Null期

机译：评估潜在航行意外的混合事故分析方法：以船舶停飞为例
3. Safety Performance of Large Grounding Grid With Fault Current Injected from Multiple Grounding Points [J] . Zhang Bo, He Jinliang, Jiang Yukuan Industry Applications, IEEE Transactions on . 2015,第6期

机译：从多个接地点注入故障电流的大型接地网的安全性能
4. Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation [C] . Ronghang Hu, Daniel Fried, Anna Rohrbach, Annual meeting of the Association for Computational Linguistics . 2019

机译：你是不是在寻找？接地到视觉和语言导航中的多种方式
5. Multi-modal surrogates for retrieving and making sense of videos: Is synchronization between the multiple modalities optimal? [D] . Song, Yaxiao. 2010

机译：用于检索和理解视频的多模式代理人：多个模式之间的同步是否最佳？
6. Developmental Designs and Adult Functions of Cortical Maps in Multiple Modalities: Perception Attention Navigation Numbers Streaming Speech and Cognition [O] . Stephen Grossberg 2020

机译：多种模式的皮质图的开发设计和成人功能：知觉注意力导航数字流语音和认知
7. Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation [O] . Ronghang Hu, Daniel Fried, Anna Rohrbach, 2019

机译：你是不是在寻找？接地到视觉和语言导航中的多种方式

Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation

摘要

著录项

相似文献

相关主题

期刊订阅