【24h】

Empowered skills

机译:赋权的技能

获取原文

摘要

Robot Reinforcement Learning (RL) algorithms return a policy that maximizes a global cumulative reward signal but typically do not create diverse behaviors. Hence, the policy will typically only capture a single solution of a task. However, many motor tasks have a large variety of solutions and the knowledge about these solutions can have several advantages. For example, in an adversarial setting such as robot table tennis, the lack of diversity renders the behavior predictable and hence easy to counter for the opponent. In an interactive setting such as learning from human feedback, an emphasis on diversity gives the human more opportunity for guiding the robot and to avoid the latter to be stuck in local optima of the task. In order to increase diversity of the learned behaviors, we leverage prior work on intrinsic motivation and empowerment. We derive a new intrinsic motivation signal by enriching the description of a task with an outcome space, representing interesting aspects of a sensorimotor stream. For example, in table tennis, the outcome space could be given by the return position and return ball speed. The intrinsic motivation is now given by the diversity of future outcomes, a concept also known as empowerment. We derive a new policy search algorithm that maximizes a trade-off between the extrinsic reward and this intrinsic motivation criterion. Experiments on a planar reaching task and simulated robot table tennis demonstrate that our algorithm can learn a diverse set of behaviors within the area of interest of the tasks.
机译:机器人强化学习(RL)算法返回的策略可最大化全局累积奖励信号,但通常不会产生各种行为。因此,该策略通常只会捕获任务的单个解决方案。但是,许多电机任务具有多种解决方案,并且有关这些解决方案的知识可能具有多个优点。例如,在诸如机器人乒乓球之类的对抗环境中,缺乏多样性使得行为是可预测的,因此易于对抗对手。在诸如从人类反馈中学习的交互式环境中,对多样性的强调为人类提供了更多机会来引导机器人,并避免后者陷入任务的局部最优中。为了增加学习行为的多样性,我们利用先前的工作来激发内在动力和赋权。我们通过用结果空间丰富任务的描述来得出新的内在动机信号,该结果空间代表了感觉运动流的有趣方面。例如,在乒乓球中,结果空间可以由返回位置和返回球速度给定。现在,内在动力是由未来结果的多样性所赋予的,这一概念也称为授权。我们推导了一种新的策略搜索算法,该算法最大程度地权衡了外部奖励和此内在动机标准之间的权衡。在平面到达任务和模拟机器人乒乓球上的实验表明,我们的算法可以在任务感兴趣的区域内学习各种行为。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号