首页> 外文OA文献 >Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments

Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments




A robot that can carry out a natural-language instruction has been a dreamsince before the Jetsons cartoon series imagined a life of leisure mediated bya fleet of attentive robot helpers. It is a dream that remains stubbornlydistant. However, recent advances in vision and language methods have madeincredible progress in closely related areas. This is significant because arobot interpreting a natural-language navigation instruction on the basis ofwhat it sees is carrying out a vision and language process that is similar toVisual Question Answering. Both tasks can be interpreted as visually groundedsequence-to-sequence translation problems, and many of the same methods areapplicable. To enable and encourage the application of vision and languagemethods to the problem of interpreting visually-grounded navigationinstructions, we present the Matterport3D Simulator -- a large-scalereinforcement learning environment based on real imagery. Using this simulator,which can in future support a range of embodied vision and language tasks, weprovide the first benchmark dataset for visually-grounded natural languagenavigation in real buildings -- the Room-to-Room (R2R) dataset.
机译:在Jetsons Cartoon系列想象的休闲介入的休闲介绍的休闲生活中,可以进行自然语言指导的机器人。这是一个遗留顽固的梦想。然而,近期愿景和语言方法的进展在密切相关的地区具有变得不可思议的进展。这是重要的,因为Arobot根据IT看到的基础上解释自然语言导航指令正在进行类似于类似于特定问题的愿景和语言过程。这两个任务都可以解释为视觉上的序列到序列翻译问题,并且许多相同的方法区域可分布。为了启用和鼓励应用视觉和慵懒的媒体,以解释视觉上接地的导航措施,我们展示了基于真实图像的大型针对性学习环境。使用此模拟器可以在将来支持一系列体现的视觉和语言任务中,我们在真实建筑物中的视觉上接地的天然朗吉拉乌拉乌地区的第一个基准数据集 - 房间到室(R2R)数据集。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号