learning with policy prediction in continuous state-action multi-agent decision processes

首页> 外文期刊>Soft computing: A fusion of foundations, methodologies and applications >learning with policy prediction in continuous state-action multi-agent decision processes

【24h】

learning with policy prediction in continuous state-action multi-agent decision processes

机译：在连续状态动作多代理决策过程中学习策略预测

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Inspired by recent attention to multi-agent reinforcement learning (MARL), the effort to provide efficient methods in this field is increasing. But, there are many issues which make this field challenging. Decision making of an agent depends on the other agents' behavior while sharing information is not always possible. On the other hand, predicting other agents' policies while they are also learning is a difficult task. Also, some agents in a multi-agent environment may not behave rationally. In such cases, achieving Nash equilibrium, as a target in a system with ideal behavior, is not possible and the best policy is the best response to the other agents' policies. In addition, many real-world multi-agent problems have a continuous nature in their state and action spaces. This induces complexity in MARL scenarios. In order to overcome these challenges, we propose a new multi-agent learning method based on fuzzy least-square policy iteration. The proposed method consists of two parts: an Inner Model as one other agent policy approximation method and a multi-agent method to learn a near-optimal policy based on the others agents' policies. Both of the proposed algorithms are applicable to problems with continuous state and action spaces. These methods can be used independently or in combination with each other. They are defined to perfectly suit each other so that the outputs of Inner Model are entirely consistent with the nature of inputs of the multi-agent method. In problems with no possibility of explicit communication, combinations of the proposed methods are recommended. In addition, theoretical analysis proves the near optimality of the policies learned by these methods. We evaluate the learning methods in problems with continuous state-action spaces: the well-known predator-prey problem and the unit commitment problem in the smart power grid. The results are satisfactory and show acceptable performance of our methods.

机译：灵感来自最近关注多功能钢筋学习（Marl），在该领域提供有效方法的努力正在增加。但是，有许多问题使这个领域具有挑战性。代理的决策取决于其他代理的行为，同时共享信息并不总是可能的。另一方面，预测其他代理人的政策，而他们也是学习的艰巨任务。此外，多种子体环境中的一些代理可能无法理性地表现。在这种情况下，无法实现纳什均衡，作为具有理想行为的系统中的目标，是不可能的，最佳政策是对其他代理商的政策的最佳反应。此外，许多现实世界的多代理问题在其状态和行动空间中具有持续的性质。这在Marl情景中引起了复杂性。为了克服这些挑战，我们提出了一种基于模糊最小二乘政策迭代的新的多代理学习方法。该方法由两部分组成：内部模型作为另一个代理策略近似方法和多委托方法，用于根据其他代理商的政策学习近乎最佳政策。两个所提出的算法适用于连续状态和动作空间的问题。这些方法可以独立地使用或彼此组合使用。它们被定义为完全适合彼此，以便内模型的输出完全是与多算法方法的输入的性质一致。在没有明确通信的可能性的问题中，建议使用所提出的方法的组合。此外，理论分析证明了这些方法学习的政策的近乎最优性。我们评估了连续状态行动空间问题中的学习方法：众所周知的捕食者 - 猎物问题和智能电网中的单位承诺问题。结果令人满意，表明了我们的方法的可接受性能。

著录项

来源
《Soft computing: A fusion of foundations, methodologies and applications》 |2020年第2期|共18页
作者

展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算机软件;
关键词
Fuzzy policy iteration; Reinforcement learning; Multi-agent learning; Fuzzy systems;

机译：模糊政策迭代;加固学习;多智能经纪人学习;模糊系统;

相似文献

外文文献
中文文献
专利

1. learning with policy prediction in continuous state-action multi-agent decision processes [J] . Soft computing: A fusion of foundations, methodologies and applications . 2020,第2期

机译：在连续状态动作多代理决策过程中学习策略预测
2. On the empirical state-action frequencies in Markov decision processes under general policies [J] . Mannor S, Tsitsiklis JN Mathematics of operations research . 2005,第3期

机译：一般策略下马尔可夫决策过程中的经验状态作用频率
3. Policy learning in continuous-time Markov decision processes using Gaussian Processes [J] . Bartocci Ezio, Bortolussi Luca, Brazdil Tomas, Performance Evaluation . 2017,第nova期

机译：使用高斯过程的连续时间马尔可夫决策过程中的策略学习
4. Learning Policies for Markov Decision Processes in Continuous Spaces [C] . Santiago Paternain, Juan Andrés Bazerque, Austin Small, IEEE Conference on Decision and Control . 2018

机译：连续空间中马尔可夫决策过程的学习策略
5. Computational Modeling of Multi-Agent, Continuous Decision Making in Competitive Contexts [D] . McDonald, Kelsey. 2021

机译：多助手的计算建模，竞争环境中的连续决策
6. Challenges and opportunities for policy decisions to address health equity in developing health systems: case study of the policy processes in the Indian state of Orissa [O] . Saji S Gopalan, Satyanarayan Mohanty, Ashis Das 2011

机译：在发展卫生系统中解决卫生公平的政策决策的挑战和机遇：印度奥里萨邦政策流程的案例研究
7. On the Empirical State-Action Frequencies in Markov Decision Processes Under General Policies [O] . Shie Mannor, John N. Tsitsiklis 2004

机译：一般政策下马尔可夫决策过程中的经验状态 - 行动频率
8. Cooperation and Coordination Between Fuzzy Reinforcement Learning Agents in Continuous State Partially Observable Markov Decision Processes [R] . Berenji, Hamid R., Vengerov, David 1999

机译：连续状态部分可观测马尔可夫决策过程中模糊强化学习agent的协作与协调

learning with policy prediction in continuous state-action multi-agent decision processes

摘要

著录项

相似文献

相关主题

期刊订阅