首页> 外文会议>IEEE Winter Conference on Applications of Computer Vision >Optimistic Agent: Accurate Graph-Based Value Estimation for More Successful Visual Navigation
【24h】

Optimistic Agent: Accurate Graph-Based Value Estimation for More Successful Visual Navigation

机译:乐观代理:用于更成功的视觉导航的基于准确的图表值估计

获取原文

摘要

We humans can impeccably search for a target object, given its name only, even in an unseen environment. We argue that this ability is largely due to three main reasons: the incorporation of prior knowledge (or experience), the adaptation of it to the new environment using the observed visual cues and most importantly optimistically searching without giving up early. This is currently missing in the state-of-the-art visual navigation methods based on Reinforcement Learning (RL). In this paper, we propose to use externally learned prior knowledge of the relative object locations and integrate it into our model by constructing a neural graph. In order to efficiently incorporate the graph without increasing the state-space complexity, we propose Graph-based Value Estimation (GVE) module. GVE provides a more accurate baseline for estimating the Advantage function in actor-critic RL algorithm. This results in reduced value estimation error and, consequently, convergence to a more optimal policy. Through empirical studies, we show that our agent, dubbed as the optimistic agent, has a more realistic estimate of the state value during a navigation episode which leads to a higher success rate. Our extensive ablation studies show the efficacy of our simple method which achieves the state-of-the-art results measured by the conventional visual navigation metrics, e.g. Success Rate (SR) and Success weighted by Path Length (SPL), in AI2THOR environment.
机译:我们的人类可以在仅在看不见的环境中鉴定其名称,因此可以无可挑剔地搜索目标对象。我们认为这种能力在很大程度上是由于三个主要原因:结合先前知识(或经验),使用观察到的视觉提示将其适应新环境以及最重要的是在未提前放弃的情况下进行乐观搜索。这目前缺少基于强化学习(RL)的最先进的视觉导航方法。在本文中,我们建议通过构建神经图来使用外部学习相关对象位置的先验知识,并通过构建神经图来将其集成到我们的模型中。为了在不增加状态空间复杂度的情况下有效地纳入图形,我们提出了基于图的值估计(GVE)模块。 GVE提供了一种更准确的基线,用于估计演员 - 评论仪RL算法中的优势函数。这导致值降低的值估计误差,从而使收敛到更优化的政策。通过经验研究,我们表明我们的代理商被称为乐观的代理,在导航集中具有更现实的估计状态,导致较高的成功率。我们广泛的消融研究表明了我们简单的方法实现了通过传统的视觉导航度量测量的最先进结果的效果,例如,通过路径长度(SPL)加权成功率(SR)和成功,在AI2thor环境中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号