首页> 外文会议>IEEE International Conference on Robotics and Automation >SHIV: Reducing supervisor burden in DAgger using support vectors for efficient learning from demonstrations in high dimensional state spaces
【24h】

SHIV: Reducing supervisor burden in DAgger using support vectors for efficient learning from demonstrations in high dimensional state spaces

机译:SHIV:使用支持向量从高维状态空间中的演示中高效学习,从而减轻DAgger中的主管负担

获取原文

摘要

Online learning from demonstration algorithms such as DAgger can learn policies for problems where the system dynamics and the cost function are unknown. However they impose a burden on supervisors to respond to queries each time the robot encounters new states while executing its current best policy. The MMD-IL algorithm reduces supervisor burden by filtering queries with insufficient discrepancy in distribution and maintaining multiple policies. We introduce the SHIV algorithm (Svm-based reduction in Human InterVention), which converges to a single policy and reduces supervisor burden in non-stationary high dimensional state distributions. To facilitate scaling and outlier rejection, filtering is based on a measure of risk defined in terms of distance to an approximate level set boundary defined by a One Class support vector machine. We report on experiments in three contexts: 1) a driving simulator with a 27,936 dimensional visual feature space, 2) a push-grasping in clutter simulation with a 22 dimensional state space, and 3) physical surgical needle insertion with a 16 dimensional state space. Results suggest that SHIV can efficiently learn policies with up to 70% fewer queries that DAgger.
机译:通过演示算法(例如DAgger)的在线学习可以学习针对系统动力学和成本函数未知的问题的策略。但是,它们在每次执行新的最佳策略时,机器人每次遇到新状态时,都会给管理人员增加响应查询的负担。 MMD-IL算法通过过滤分布差异不大的查询并维护多个策略来减轻管理者的负担。我们介绍了SHIV算法(基于人工干预的基于Svm的减少),该算法收敛于单个策略,并减轻了非平稳高维状态分布中的主管负担。为了促进缩放和离群值剔除,过滤基于风险度量,该风险度量是根据与One Class支持向量机定义的近似级别集边界的距离来定义的。我们在以下三种情况下报告实验情况:1)具有27,936维视觉特征空间的驾驶模拟器,2)具有22维状态空间的杂物模拟中的推抓,以及3)具有16维状态空间的物理外科手术针的插入。结果表明,SHIV可以以比DAgger少70%的查询来有效地学习策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号