DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations

Sanjay Krishnan; Roy Fox; Ion Stoica; Ken Goldberg

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations

【24h】

DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations

机译：DDCO：通过演示发现用于机器人学习的深层连续选项

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

An option is a short-term skill consisting of a control policy for a specified region of the state space, and a termination condition recognizing leaving that region. In prior work, we proposed an algorithm called Deep Discovery of Options (DDO) to discover options to accelerate reinforcement learning in Atari games. This paper studies an extension to robot imitation learning, called Discovery of Deep Continuous Options (DDCO), where low-level continuous control skills parametrized by deep neural networks are learned from demonstrations. We extend DDO with: (1) a hybrid categorical-continuous distribution model to parametrize high-level policies that can invoke discrete options as well continuous control actions, and (2) a cross-validation method that relaxes DDO’s requirement that users specify the number of options to be discovered. We evaluate DDCO in simulation of a 3-link robot in the vertical plane pushing a block with friction and gravity, and in two physical experiments on the da Vinci surgical robot, needle insertion where a needle is grasped and inserted into a silicone tissue phantom, and needle bin picking where needles and pins are grasped from a pile and categorized into bins. In the 3-link arm simulation, results suggest that DDCO can take 3x fewer demonstrations to achieve the same reward compared to a baseline imitation learning approach. In the needle insertion task, DDCO was successful 8/10 times compared to the next most accurate imitation learning baseline 6/10. In the surgical bin picking task, the learned policy successfully grasps a single object in 66 out of 99 attempted grasps, and in all but one case successfully recovered from failed grasps by retrying a second time.

机译：一种选择是一项短期技能，包括对状态空间指定区域的控制策略和识别离开该区域的终止条件。在先前的工作中，我们提出了一种名为“选项深度发现”（DDO）的算法，以发现用于加速Atari游戏中强化学习的选项。本文研究了一种扩展的机器人模仿学习方法，称为“深度连续选项发现”（DDCO），该方法从演示中学习了由深度神经网络参数化的低级连续控制技能。我们将DDO扩展为：（1）混合分类连续分布模型，以参数化可以调用离散选项以及连续控制动作的高级策略，以及（2）交叉验证方法，可以放宽DDO对用户指定数字的要求发现的选项。我们在垂直平面上用摩擦力和重力推动一个块的3连杆机器人模拟中评估DDCO，并在da Vinci外科手术机器人上进行了两次物理实验，即插入针头，其中握住针头并将其插入硅胶组织体模中，针箱拣选，将针和大头针从一堆中抓起并归类为箱。在3链接臂仿真中，结果表明，与基线模仿学习方法相比，DDCO可以减少3倍的演示次数来获得相同的奖励。在针插入任务中，DDCO成功率为8/10倍，而下一个最准确的模仿学习基线为6/10。在外科手术箱拾取任务中，学习的策略成功地抓住了99个尝试的抓握中的66个对象中的一个对象，除一种情况外，在所有情况下，都通过再次尝试从失败的抓握中成功恢复了过来。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2017年第6期|共20页
作者
Sanjay Krishnan; Roy Fox; Ion Stoica; Ken Goldberg;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations [J] . Sanjay Krishnan, Roy Fox, Ion Stoica, JMLR: Workshop and Conference Proceedings . 2017,第1期

机译：DDCO：通过演示发现用于机器人学习的深层连续选项
2. Demonstration Guided Actor-Critic Deep Reinforcement Learning for Fast Teaching of Robots in Dynamic Environments [J] . Liang Gong, Te Sun, Xudong Li, IFAC PapersOnLine . 2020,第5期

机译：示范引导演员 - 在动态环境中快速教学的深度加固学习
3. Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning [J] . Muhammad Burhan Hafez, Cornelius Weber, Matthias Kerzel, Paladyn: Journal of Behavioral Robotics . 2019,第1期

机译：深度本质上促进的连续演员 - 高效机器人保护技能学习
4. A deep structure for option discovery in reinforcement learning [C] . Jahanbakhsh Mohammadi Smart Grid Conference . 2016

机译：强化学习中选项发现的深层结构
5. Robots Learning Manipulation Tasks from Demonstrations and Practice [D] . Mao, Ren. 2017

机译：机器人从演示和实践中学习操纵任务
6. A Task-Learning Strategy for Robotic Assembly Tasks from Human Demonstrations [O] . Guanwen Ding, Yubin Liu, Xizhe Zang, 2020

机译：人类示威活动的机器人装配任务任务学习策略
7. Comparing Human-Centric and Robot-Centric Sampling for Robot Deep Learning from Demonstrations [O] . Laskey, Michael, Chuck, Caleb, Lee, Jonathan, 2017

机译：机器人深度比较以人为中心和以机器人为中心的采样从示范中学习

DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations

摘要

著录项

相似文献

相关主题

期刊订阅