首页> 外文会议>International Conference on Systems, Man, and Cybernetics >A shaped-q learning for multi-agents systems
【24h】

A shaped-q learning for multi-agents systems

机译:用于多代理系统的Q学习

获取原文

摘要

This paper proposes an architecture where each agent maintains a cooperative tendency table (CTT). In the process of learning, agents need not communicate with each other but observe partners' actions while taking actions. If one of the agents meets a bad situation, such as bumping onto obstacles after taking an action. In such a case, agents will receive a bad reward from the environment. Similarly, if one agent reaches a goal after taking an action, agents obtain a good reward instead. Rewards are used to update the policy and to adjust cooperative tendency values which are recorded in the individual CTT. When an agent perceives a state, the corresponding cooperative tendency value, and the Q-value are merged to a Shaped-Q value. The action with maximal Shaped-Q value in this state will be selected. After agents take actions and receive a reward, agents update their own CTTs. Therefore, agents could use this method to reach a consensus more quickly to enhance learning efficiency and reduce the occurrence of stagnation. The simulation results demonstrate that the proposed method can speed up the learning process and solve the problem of huge memory space consumption to some degrees. As well, it can make agents complete the task together more efficiently.
机译:本文提出了一种架构,其中每个代理维持合作趋势表(CTT)。在学习过程中,代理人不需要彼此沟通,而是在采取行动时遵守合作伙伴的行动。如果其中一个代理人遇到了糟糕的情况,例如在采取行动后碰到障碍。在这种情况下,代理商将获得对环境的不良奖励。同样,如果一个代理人在采取行动后达到目标,那么代理商就获得了良好的奖励。奖励用于更新策略并调整记录在单个CTT中的合作趋势值。当代理人感知状态时,相应的协作趋势值和Q值被合并为Q值。将选择具有最大形状-Q值的动作。代理采取行动并获得奖励后,代理商更新自己的CTT。因此,代理商可以使用这种方法更快地达成共识,以提高学习效率,减少停滞的发生。仿真结果表明,该方法可以加快学习过程,并解决巨大的内存空间消耗问题到某些程度。同样,它可以使代理商更有效地完成任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号