【24h】

Reward Learning from Narrated Demonstrations

机译:叙述式学习中的奖励学习

获取原文

摘要

Humans effortlessly 'program' one another by communicating goals and desires in natural language. In contrast, humans program robotic behaviours by indicating desired object locations and poses to be achieved [5], by providing RGB images of goal configurations [19], or supplying a demonstration to be imitated [17]. None of these methods generalize across environment variations, and they convey the goal in awkward technical terms. This work proposes joint learning of natural language grounding and instructable behavioural policies reinforced by perceptual detectors of natural language expressions, grounded to the sensory inputs of the robotic agent. Our supervision is narrated visual demonstrations (NVD), which are visual demonstrations paired with verbal narration (as opposed to being silent). We introduce a dataset of NVD where teachers perform activities while describing them in detail. We map the teachers' descriptions to perceptual reward detectors, and use them to train corresponding behavioural policies in simulation. We empirically show that our instructable agents (i) learn visual reward detectors using a small number of examples by exploiting hard negative mined configurations from demonstration dynamics, (ii) develop pick-and-place policies using learned visual reward detectors, (iii) benefit from object-factorized state representations that mimic the syntactic structure of natural language goal expressions, and (iv) can execute behaviours that involve novel objects in novel locations at test time, instructed by natural language.
机译:人类通过用自然语言交流目标和愿望而毫不费力地互相“编程”。相反,人类通过提供目标配置的RGB图像[19]或提供要模仿的演示[17]来指示所需的目标位置和姿势,从而对机器人行为进行编程。这些方法都无法跨环境变化进行概括,它们以尴尬的技术术语传达了目标。这项工作提出了对自然语言基础和可指导的行为策略的联合学习,该策略由自然语言表达的感知检测器加强,以机器人代理人的感觉输入为基础。我们的监督是带叙述的视觉演示(NVD),是与口头叙述相结合的视觉演示(相对于保持沉默)。我们引入了NVD数据集,教师可以在其中详细描述活动的同时进行活动。我们将教师的描述映射到感知奖励检测器,并使用它们来训练模拟中的相应行为策略。我们的经验表明,我们的可指导代理人(i)通过利用演示动态中的硬性负开采配置,使用少量示例学习视觉奖励探测器;(ii)使用学习的视觉奖励探测器开发取放策略;(iii)受益模仿自然语言目标表达的句法结构的对象分解状态表示形式;(iv)可以在自然语言的指导下,在测试时间执行涉及新颖对象的行为。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号