首页> 外文期刊>北海道工業大学研究紀要 >Cooperation action acquisition of the autonomous robot swarm by reinforcement learning in a seesaw balancing problem
【24h】

Cooperation action acquisition of the autonomous robot swarm by reinforcement learning in a seesaw balancing problem

机译:通过跷跷板平衡问题的加固学习,合作行动自治机器人群体唆使

获取原文
获取原文并翻译 | 示例
           

摘要

This paper proposes a new approach to realize a reinforcement learning scheme for autonomous multiple agents system. In our approach, we treat the cooperative agents systems in which there are multiple autonomous mobile robots, and the seesaw balancing task is given. This problem is an example of corresponding tasks to find the appropriate locations for multiple mobile robots. And as another problem, in the environment where autonomous robots do cooperative action, there is a possibility that the whole system may have instability by an agent's failure. It is important practically to set up a control system in consideration of stability when an agent breaks down. Each robot agent on a seesaw keeps to be balanced state by using corresponding reinforcement learning systems. As a most useful algorithm of reinforcement learning, the Q-learning method is well known. However, feasible action values of robot agents must be categorized into some discrete action values. Therefore, in this study, the actor-critic method is applied to treat continuous values of agents actions. Each robot agent has a set of normal distribution, that determines a distance of the robot movement for a corresponding state of the seesaw system. Based on a result of movement in this system, the normal distribution is modified by actor-critic learning method. The simulation result shows the effectiveness of our approaching method.
机译:本文提出了一种实现自主多种代理系统的加固学习方案的新方法。在我们的方法中,我们处理有多个自主移动机器人的合作代理系统,并且跷跷板平衡任务。此问题是用于查找多个移动机器人的适当位置的相应任务的示例。作为另一个问题,在自主机器人做协同行动的环境中,整个系统可能有可能因代理故障而不稳定。实际上,考虑到代理人分解时,实际上建立控制系统。通过使用相应的加强学习系统,跷跷板上的每个机器人代理都保持平衡状态。作为最有用的加强学习算法,Q学习方法是众所周知的。然而,机器人代理的可行动作值必须分为一些离散的动作值。因此,在本研究中,应用演员 - 批评方法用于治疗药剂行动的连续值。每个机器人代理具有一组正态分布,其确定用于跷跷板系统的相应状态的机器人移动的距离。基于该系统的运动结果,由演员 - 评论家学习方法修改了正常分布。仿真结果显示了我们接近方法的有效性。

著录项

  • 来源
    《北海道工業大学研究紀要》 |2002年第30期|共8页
  • 作者单位

    Department of Applied Physics and Advanced Technologies (FATA) National Autonomous University of Mexico A.P. 1-1010 Queretaro Qro. 67000 Mexico;

    Department of Applied Physics and Advanced Technologies (FATA) National Autonomous University of Mexico A.P. 1-1010 Queretaro Qro. 67000 Mexico;

    Department of Applied Physics and Advanced Technologies (FATA) National Autonomous University of Mexico A.P. 1-1010 Queretaro Qro. 67000 Mexico;

    Department of Applied Physics and Advanced Technologies (FATA) National Autonomous University of Mexico A.P. 1-1010 Queretaro Qro. 67000 Mexico;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 jpn
  • 中图分类 F43;
  • 关键词

    Reinforcement learning; Actor-critic; Seesaw balancing problems; Multiagent system;

    机译:加强学习;演员 - 评论家;跷跷板平衡问题;多透系统;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号