首页> 外文期刊>JMLR: Workshop and Conference Proceedings >DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations
【24h】

DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations

机译:DDCO:通过演示发现用于机器人学习的深层连续选项

获取原文
           

摘要

An option is a short-term skill consisting of a control policy for a specified region of the state space, and a termination condition recognizing leaving that region. In prior work, we proposed an algorithm called Deep Discovery of Options (DDO) to discover options to accelerate reinforcement learning in Atari games. This paper studies an extension to robot imitation learning, called Discovery of Deep Continuous Options (DDCO), where low-level continuous control skills parametrized by deep neural networks are learned from demonstrations. We extend DDO with: (1) a hybrid categorical-continuous distribution model to parametrize high-level policies that can invoke discrete options as well continuous control actions, and (2) a cross-validation method that relaxes DDO’s requirement that users specify the number of options to be discovered. We evaluate DDCO in simulation of a 3-link robot in the vertical plane pushing a block with friction and gravity, and in two physical experiments on the da Vinci surgical robot, needle insertion where a needle is grasped and inserted into a silicone tissue phantom, and needle bin picking where needles and pins are grasped from a pile and categorized into bins. In the 3-link arm simulation, results suggest that DDCO can take 3x fewer demonstrations to achieve the same reward compared to a baseline imitation learning approach. In the needle insertion task, DDCO was successful 8/10 times compared to the next most accurate imitation learning baseline 6/10. In the surgical bin picking task, the learned policy successfully grasps a single object in 66 out of 99 attempted grasps, and in all but one case successfully recovered from failed grasps by retrying a second time.
机译:一种选择是一项短期技能,包括对状态空间指定区域的控制策略和识别离开该区域的终止条件。在先前的工作中,我们提出了一种名为“选项深度发现”(DDO)的算法,以发现用于加速Atari游戏中强化学习的选项。本文研究了一种扩展的机器人模仿学习方法,称为“深度连续选项发现”(DDCO),该方法从演示中学习了由深度神经网络参数化的低级连续控制技能。我们将DDO扩展为:(1)混合分类连续分布模型,以参数化可以调用离散选项以及连续控制动作的高级策略,以及(2)交叉验证方法,可以放宽DDO对用户指定数字的要求发现的选项。我们在垂直平面上用摩擦力和重力推动一个块的3连杆机器人模拟中评估DDCO,并在da Vinci外科手术机器人上进行了两次物理实验,即插入针头,其中握住针头并将其插入硅胶组织体模中,针箱拣选,将针和大头针从一堆中抓起并归类为箱。在3链接臂仿真中,结果表明,与基线模仿学习方法相比,DDCO可以减少3倍的演示次数来获得相同的奖励。在针插入任务中,DDCO成功率为8/10倍,而下一个最准确的模仿学习基线为6/10。在外科手术箱拾取任务中,学习的策略成功地抓住了99个尝试的抓握中的66个对象中的一个对象,除一种情况外,在所有情况下,都通过再次尝试从失败的抓握中成功恢复了过来。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号