Reward Learning from Narrated Demonstrations

机译：叙述式学习中的奖励学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Humans effortlessly 'program' one another by communicating goals and desires in natural language. In contrast, humans program robotic behaviours by indicating desired object locations and poses to be achieved [5], by providing RGB images of goal configurations [19], or supplying a demonstration to be imitated [17]. None of these methods generalize across environment variations, and they convey the goal in awkward technical terms. This work proposes joint learning of natural language grounding and instructable behavioural policies reinforced by perceptual detectors of natural language expressions, grounded to the sensory inputs of the robotic agent. Our supervision is narrated visual demonstrations (NVD), which are visual demonstrations paired with verbal narration (as opposed to being silent). We introduce a dataset of NVD where teachers perform activities while describing them in detail. We map the teachers' descriptions to perceptual reward detectors, and use them to train corresponding behavioural policies in simulation. We empirically show that our instructable agents (i) learn visual reward detectors using a small number of examples by exploiting hard negative mined configurations from demonstration dynamics, (ii) develop pick-and-place policies using learned visual reward detectors, (iii) benefit from object-factorized state representations that mimic the syntactic structure of natural language goal expressions, and (iv) can execute behaviours that involve novel objects in novel locations at test time, instructed by natural language.

机译：人类通过用自然语言交流目标和愿望而毫不费力地互相“编程”。相反，人类通过提供目标配置的RGB图像[19]或提供要模仿的演示[17]来指示所需的目标位置和姿势，从而对机器人行为进行编程。这些方法都无法跨环境变化进行概括，它们以尴尬的技术术语传达了目标。这项工作提出了对自然语言基础和可指导的行为策略的联合学习，该策略由自然语言表达的感知检测器加强，以机器人代理人的感觉输入为基础。我们的监督是带叙述的视觉演示（NVD），是与口头叙述相结合的视觉演示（相对于保持沉默）。我们引入了NVD数据集，教师可以在其中详细描述活动的同时进行活动。我们将教师的描述映射到感知奖励检测器，并使用它们来训练模拟中的相应行为策略。我们的经验表明，我们的可指导代理人（i）通过利用演示动态中的硬性负开采配置，使用少量示例学习视觉奖励探测器；（ii）使用学习的视觉奖励探测器开发取放策略；（iii）受益模仿自然语言目标表达的句法结构的对象分解状态表示形式；（iv）可以在自然语言的指导下，在测试时间执行涉及新颖对象的行为。

著录项

来源
《IEEE/CVF Conference on Computer Vision and Pattern Recognition》|2018年|7004-7013|共10页
会议地点 Salt Lake City(US)
作者
Hsiao-Yu Tung; Adam W. Harley; Liang-Kang Huang; Katerina Fragkiadaki;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Visualization; Natural languages; Detectors; Grounding; Task analysis; Speech recognition; Microphones;

机译：可视化；自然语言；探测器；接地；任务分析；语音识别;传声器;
入库时间 2022-08-26 14:35:28

相似文献

外文文献
中文文献
专利

1. Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards [J] . Guoyu Zuo, Qishen Zhao, Jiahao Lu, International Journal of Advanced Robotic Systems . 2020,第1期

机译：使用具有稀疏奖励的机器人任务的演示高效的后敏感钢筋学习
2. Learning motions from demonstrations and rewards with time-invariant dynamical systems based policies [J] . Rey Joel, Kronander Klas, Farshidian Farbod, Autonomous robots . 2018,第1期

机译：与基于时间不变的动态系统的演示和奖励的学习动作
3. Bayesian Nonparametric Reward Learning From Demonstration [J] . Michini Bernard, Walsh Thomas J., Agha-Mohammadi Ali-Akbar, Robotics, IEEE Transactions on . 2015,第2期

机译：示范贝叶斯非参数奖励学习
4. Reward Learning from Narrated Demonstrations [C] . Hsiao-Yu Tung, Adam W. Harley, Liang-Kang Huang, IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2018

机译：从叙述示威奖励学习
5. Effects of Nicotine Withdrawal on Motivation, Reward Sensitivity and Reward-Learning. [D] . Oliver, Jason A. 2015

机译：尼古丁戒断对动机，奖励敏感性和奖励学习的影响。
6. Mouse Strain Differences in Opiate Reward Learning Are Explained by Differences in Anxiety Not Reward or Learning [O] . Colleen L. Dockstader, Derek van der Kooy 2001

机译：阿片类奖赏学习中的小鼠品系差异由焦虑非奖赏或学习中的差异解释
7. Learning to Run with Potential-Based Reward Shaping and Demonstrations from Video Data [O] . Aleksandra Malysheva, Daniel Kudenko, Aleksei Shpilman 2018

机译：学习使用基于潜在的奖励塑造和来自视频数据的演示

Reward Learning from Narrated Demonstrations

摘要

著录项

相似文献

相关主题

期刊订阅