A Learning Automata Approach to Multi-agent Policy Gradient Learning

机译：多代理政策梯度学习的学习自动机方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The policy gradient method is a popular technique for implementing reinforcement learning in an agent system. One of the reasons is that a policy gradient learner has a simple design and strong theoretical properties in single-agent domains. Previously, Williams showed that the REINFORCE algorithm is a special case of policy gradient learning. He also showed that a learning automaton could be seen as a special case of the REINFORCE algorithm. Learning automata theory guarantees that a group of automata will converge to a stable equilibrium in team games. In this paper we will show a theoretical connection between learning automata and policy gradient methods to transfer this theoretical result to multi-agent policy gradient learning. An appropriate exploration technique is crucial for the convergence of a multi-agent system. Since learning automata are guaranteed to converge, they posses such an exploration. We identify the identical mapping of a learning automaton onto the Boltzmann exploration strategy with an suitable temperature setting. The novel idea is that the temperature of the Boltzmann function is not dependent on time but on the action probabilities of the agents.

机译：政策梯度方法是一种用于在代理系统中实现增强学习的流行技术。其中一个原因是，政策梯度学习者在单代理结构域中具有简单的设计和强大的理论特性。此前，威廉姆斯表明，钢筋算法是政策梯度学习的特殊情况。他还表明，学习自动机可以被视为强化算法的特殊情况。学习自动机理论保证了一组自动机将收敛到团队比赛中的稳定均衡。在本文中，我们将在学习自动机构和政策梯度方法之间表现出理论连接，以将这种理论结果转移到多代理政策梯度学习。适当的探索技术对于多助理系统的收敛性至关重要。由于学习自动机保证融合，因此他们拥有这样的探索。我们用合适的温度设定确定学习自动机上的相同映射到Boltzmann探索策略。新颖的想法是，Boltzmann功能的温度不依赖于时间，而是依赖于代理的动作概率。

著录项

来源
《International Conference on Knowledge-Based Intelligent Information and Engineering Systems》|2008年||共12页
会议地点
作者
Maarten Peeters; Ville Koenoenen; Katja Verbeeck; Ann Nowe;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Generalized learning automata for multi-agent reinforcement learning [J] . Yann-Michaeel De Hauwere, Peter Vrancx, Ann Nowe AI communications . 2010,第4期

机译：用于多主体强化学习的广义学习自动机
2. A reinforcement learning approach for developing routing policies in multi-agent production scheduling [J] . Yi-Chi Wang, John M. Usher The International Journal of Advanced Manufacturing Technology . 2007,第3a4期

机译：在多主体生产调度中制定路由策略的强化学习方法
3. A reinforcement learning approach for developing routing policies in multi-agent production scheduling [J] . Yi-Chi Wang, John M. Usher The International Journal of Advanced Manufacturing Technology . 2007,第3a4期

机译：在多主体生产调度中制定路由策略的强化学习方法
4. A Learning Automata Approach to Multi-agent Policy Gradient Learning [C] . Maarten Peeters, Ville Koenoenen, Katja Verbeeck, International Conference on Knowledge-Based Intelligent Information and Engineering Systems;KES 2008 . 2008

机译：多主体策略梯度学习的学习自动机方法
5. Policy-Aware Model Learning for Policy Gradient Methods [D] . Abachi, Romina . 2020

机译：政策感知模型学习策略梯度方法
6. On-Demand Channel Bonding in Heterogeneous WLANs: A Multi-Agent Deep Reinforcement Learning Approach [O] . Hang Qi, Hao Huang, Zhiqun Hu, 2020

机译：异构WLAN中的按需信道绑定：多代理深度强化学习方法
7. Local Policy-sharing Systems for Multi-agent Reinforcement Learning-An Approach from the Learning Classifier System [O] . Hiroyasu INOUE, Katsunori SHIMOHARA, Osamu KATAI 2006

机译：用于多智能经纪增强学习的地方策略共享系统 - 来自学习分类器系统的方法

A Learning Automata Approach to Multi-agent Policy Gradient Learning

摘要

著录项

相似文献

相关主题

期刊订阅