首页> 外文会议>IEEE International Conference on Automation Science and Engineering >An algorithm and user study for teaching bilateral manipulation via iterated best response demonstrations
【24h】

An algorithm and user study for teaching bilateral manipulation via iterated best response demonstrations

机译:通过迭代的最佳响应演示来教授双边操纵的算法和用户研究

获取原文
获取外文期刊封面目录资料

摘要

Human demonstrations can be valuable for teaching robots to perform manipulation and coordination tasks. However, it can be difficult for human supervisors to provide demonstrations for multilateral (multi-arm) tasks, which require divided attention. In this paper, we propose a new algorithm called Bilateral Iterated Best Response (BIBR), which builds on the game-theoretic concept of Iterated Best Response. This algorithm allows a supervisor to train each manipulator iteratively, thereby reducing supervisor burden and improving the quality of demonstrations. We present a web-based user study of 51 participants controlling two agents in a GridWorld environment with a keyboard interface. We confirm prior work that bilateral demonstrations are noisier and longer than demonstrations provided separately for either manipulator when the task is asymmetric. As unilateral demonstrations lack coordination, this paper proposes learning coordinated bilateral policies from unilateral demonstrations by rolling out an estimated robot policy for one arm while the human demonstrates for the other, iteratively updating the estimated policy. Compared to a bilateral demonstration baseline, BIBR improves the success rate of the learned policy from 29.17% to 55.55% in the asymmetric task in the first full round of demonstrations. Furthermore, these policies learn trajectories that have 8.63% fewer steps and smoother trajectories, i.e., have 44.15% fewer changes in direction.
机译:人工演示对于教机器人执行操纵和协调任务可能很有价值。但是,人类主管人员很难为多边(多臂)任务提供演示,这需要引起大家的关注。在本文中,我们提出了一种新的算法,称为“双边迭代最佳响应”(BIBR),该算法建立在“迭代最佳响应”的博弈论概念的基础上。该算法允许主管反复训练每个操纵器,从而减轻主管负担并提高演示质量。我们提供了一个基于Web的用户研究,该研究由51位参与者通过键盘界面在GridWorld环境中控制两个代理组成。我们确认先前的工作表明,当任务不对称时,双边示威会比单独为任何一个操纵器单独提供的示威更加嘈杂和更长。由于单方面的示威活动缺乏协调性,因此本文提出了通过从单方面的示威活动中学习协调的双边政策的方法,即为一个手臂部署一个估计的机器人策略,而由人类为另一个手臂示威,则迭代更新该估计的策略。与双边示范基准相比,BIBR在首轮完整示范中将不对称任务中的学习策略成功率从29.17%提高到55.55%。而且,这些策略学习的轨迹具有少8.63%的步长和平滑的轨迹,即方向变化少了44.15%的轨迹。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号