首页> 外文会议>RoboCup International Symposium >Efficient Behavior Learning by Utilizing Estimated State Value of Self and Teammates
【24h】

Efficient Behavior Learning by Utilizing Estimated State Value of Self and Teammates

机译:利用自我和队友的估计州价值有效的行为学习

获取原文

摘要

Reinforcement learning applications to real robots in multi-agent dynamic environments are limited because of huge exploration space and enormously long learning time. One of the typical examples is a case of RoboCup competitions since other agents and their behavior easily cause state and action space explosion. This paper presents a method that utilizes state value functions of macro actions to explore appropriate behavior efficiently in a multi-agent environment by which the learning agent can acquire cooperative behavior with its teammates and competitive ones against its opponents. The key ideas are as follows. First, the agent learns a few macro actions and the state value functions based on reinforcement learning beforehand. Second, an appropriate initial controller for learning cooperative behavior is generated based on the state value functions. The initial controller utilizes the state values of the macro actions so that the learner tends to select a good macro action and not select useless ones. By combination of the ideas and a two-layer hierarchical system, the proposed method shows better performance during the learning than conventional methods. This paper shows a case study of 4 (defense team) on 5 (offense team) game task, and the learning agent (a passer of the offense team) successfully acquired the teamwork plays (pass and shoot) within shorter learning time.
机译:由于巨大的探险空间和大量学习时间,对多代理动态环境中的实际机器人的加固学习应用是有限的。其中一个典型的例子是自其他代理商和他们的行为以来的Robocup比赛的情况容易导致状态和行动空间爆炸。本文介绍了一种方法,该方法利用宏动作的状态值函数,以便在学习代理可以与其队友和对手竞争的竞争者获得合作行为的多种代理环境中有效地探索适当的行为。关键的想法如下。首先,代理学习了一些宏动作和基于强化学习的状态价值函数。其次,基于状态值函数生成用于学习协作行为的适当初始控制器。初始控制器利用宏操作的状态值,以便学习者倾向于选择良好的宏动作,而不是选择无用的操作。通过思想和双层分层系统的组合,所提出的方法在学习期间显示出比传统方法更好的性能。本文展示了4名(防御团队)的案例研究5(冒犯团队)游戏任务,而学习代理(违法团队的传球商)成功获得了在更短的学习时间内的团队合作播放(通过和拍摄)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号