首页> 外文OA文献 >Cooperation and Coordination Between Fuzzy Reinforcement Learning Agents in Continuous State Partially Observable Markov Decision Processes
【2h】

Cooperation and Coordination Between Fuzzy Reinforcement Learning Agents in Continuous State Partially Observable Markov Decision Processes

机译:连续状态部分可观察的马尔可夫决策过程中模糊强化学习主体之间的协作与协调

摘要

Successful operations of future multi-agent intelligent systems require efficient cooperation schemes between agents sharing learning experiences. We consider a pseudo-realistic world in which one or more opportunities appear and disappear in random locations. Agents use fuzzy reinforcement learning to learn which opportunities are most worthy of pursuing based on their promise rewards, expected lifetimes, path lengths and expected path costs. We show that this world is partially observable because the history of an agent influences the distribution of its future states. We consider a cooperation mechanism in which agents share experience by using and-updating one joint behavior policy. We also implement a coordination mechanism for allocating opportunities to different agents in the same world. Our results demonstrate that K cooperative agents each learning in a separate world over N time steps outperform K independent agents each learning in a separate world over K*N time steps, with this result becoming more pronounced as the degree of partial observability in the environment increases. We also show that cooperation between agents learning in the same world decreases performance with respect to independent agents. Since cooperation reduces diversity between agents, we conclude that diversity is a key parameter in the trade off between maximizing utility from cooperation when diversity is low and maximizing utility from competitive coordination when diversity is high.
机译:未来的多智能体智能系统的成功运作需要共享学习经验的智能体之间的有效合作方案。我们考虑一个伪现实世界,其中一个或多个机会在随机位置出现和消失。代理商使用模糊强化学习,根据其承诺回报,预期寿命,路径长度和预期路径成本来了解最值得追求的机会。我们证明这个世界是部分可观察的,因为一个代理的历史会影响其未来状态的分布。我们考虑一种合作机制,在这种合作机制中,代理商通过使用和更新一项联合行为策略来分享经验。我们还实施了一种协调机制,以将机会分配给同一世界中的不同代理商。我们的结果表明,在N个时间步长的情况下,每个在单独的世界中学习的K个协作代理要比在K * N个时间步长的情况下在一个单独的世界中每个学习的K个独立代理,其结果随着环境中部分可观察性程度的提高而变得更加明显。 。我们还显示,在同一世界学习的特工之间的合作会降低相对于独立特工的性能。由于合作减少了主体之间的多样性,因此我们得出结论,多样性是在多样性低的情况下最大化合作的效用与在多样性高的条件下最大化竞争协调的效用之间权衡的关键参数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号