...
首页> 外文期刊>Soft computing: A fusion of foundations, methodologies and applications >learning with policy prediction in continuous state-action multi-agent decision processes
【24h】

learning with policy prediction in continuous state-action multi-agent decision processes

机译:在连续状态动作多代理决策过程中学习策略预测

获取原文
获取原文并翻译 | 示例
           

摘要

Inspired by recent attention to multi-agent reinforcement learning (MARL), the effort to provide efficient methods in this field is increasing. But, there are many issues which make this field challenging. Decision making of an agent depends on the other agents' behavior while sharing information is not always possible. On the other hand, predicting other agents' policies while they are also learning is a difficult task. Also, some agents in a multi-agent environment may not behave rationally. In such cases, achieving Nash equilibrium, as a target in a system with ideal behavior, is not possible and the best policy is the best response to the other agents' policies. In addition, many real-world multi-agent problems have a continuous nature in their state and action spaces. This induces complexity in MARL scenarios. In order to overcome these challenges, we propose a new multi-agent learning method based on fuzzy least-square policy iteration. The proposed method consists of two parts: an Inner Model as one other agent policy approximation method and a multi-agent method to learn a near-optimal policy based on the others agents' policies. Both of the proposed algorithms are applicable to problems with continuous state and action spaces. These methods can be used independently or in combination with each other. They are defined to perfectly suit each other so that the outputs of Inner Model are entirely consistent with the nature of inputs of the multi-agent method. In problems with no possibility of explicit communication, combinations of the proposed methods are recommended. In addition, theoretical analysis proves the near optimality of the policies learned by these methods. We evaluate the learning methods in problems with continuous state-action spaces: the well-known predator-prey problem and the unit commitment problem in the smart power grid. The results are satisfactory and show acceptable performance of our methods.
机译:灵感来自最近关注多功能钢筋学习(Marl),在该领域提供有效方法的努力正在增加。但是,有许多问题使这个领域具有挑战性。代理的决策取决于其他代理的行为,同时共享信息并不总是可能的。另一方面,预测其他代理人的政策,而他们也是学习的艰巨任务。此外,多种子体环境中的一些代理可能无法理性地表现。在这种情况下,无法实现纳什均衡,作为具有理想行为的系统中的目标,是不可能的,最佳政策是对其他代理商的政策的最佳反应。此外,许多现实世界的多代理问题在其状态和行动空间中具有持续的性质。这在Marl情景中引起了复杂性。为了克服这些挑战,我们提出了一种基于模糊最小二乘政策迭代的新的多代理学习方法。该方法由两部分组成:内部模型作为另一个代理策略近似方法和多委托方法,用于根据其他代理商的政策学习近乎最佳政策。两个所提出的算法适用于连续状态和动作空间的问题。这些方法可以独立地使用或彼此组合使用。它们被定义为完全适合彼此,以便内模型的输出完全是与多算法方法的输入的性质一致。在没有明确通信的可能性的问题中,建议使用所提出的方法的组合。此外,理论分析证明了这些方法学习的政策的近乎最优性。我们评估了连续状态行动空间问题中的学习方法:众所周知的捕食者 - 猎物问题和智能电网中的单位承诺问题。结果令人满意,表明了我们的方法的可接受性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号