【24h】

Learning From Human Directional Corrections

机译:从人类方向性校正中学习

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

This article proposes a novel approach that enables a robot to learn an objective function incrementally from human directional corrections. Existing methods learn from human magnitude corrections; since a human needs to carefully choose the magnitude of each correction, those methods can easily lead to overcorrections and learning inefficiency. The proposed method only requires human directional corrections—corrections that only indicate the direction of an input change without indicating its magnitude. We only assume that each correction, regardless of its magnitude, points in a direction that improves the robot's current motion relative to an unknown objective function. The allowable corrections satisfying this assumption account for half of the input space, as opposed to the magnitude corrections that have to lie in a shrinking level set. For each directional correction, the proposed method updates the estimate of the objective function based on a cutting plane method, which has a geometric interpretation. We have established theoretical results to show the convergence of the learning process. The proposed method has been tested in numerical examples, a user study on two human–robot games, and a real-world quadrotor experiment. The results confirm the convergence of the proposed method and further show that the method is significantly more effective (higher success rate), efficient/effortless (less human corrections needed), and potentially more accessible (fewer early wasted trials) than the state-of-the-art robot learning frameworks.
机译:本文提出了一种新的方法,使机器人能够从人类的方向校正中逐步学习目标函数。现有方法从人类量级校正中学习;由于人类需要仔细选择每次校正的幅度,因此这些方法很容易导致过度校正和学习效率低下。所提出的方法只需要人工方向校正,即仅指示输入变化方向而不指示其大小的校正。我们只假设每个校正,无论其大小如何,都指向一个方向,该方向可以改善机器人相对于未知目标函数的当前运动。满足此假设的允许校正占输入空间的一半,而幅度校正必须位于缩小的电平集中。对于每次方向修正,所提方法基于切割平面方法更新目标函数的估计,该方法具有几何解释。我们已经建立了理论结果来显示学习过程的收敛性。所提出的方法已经在数值示例、用户对两个人机游戏的研究以及真实世界的四旋翼飞行器实验中进行了测试。结果证实了所提出方法的收敛性,并进一步表明该方法比最先进的机器人学习框架更有效(更高的成功率)、高效/轻松(需要更少的人工纠正)和潜在的更容易获得(更少的早期浪费试验)。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号