首页> 外文会议>International Conference on Knowledge-Based Intelligent Information and Engineering Systems >A Learning Automata Approach to Multi-agent Policy Gradient Learning
【24h】

A Learning Automata Approach to Multi-agent Policy Gradient Learning

机译:多代理政策梯度学习的学习自动机方法

获取原文

摘要

The policy gradient method is a popular technique for implementing reinforcement learning in an agent system. One of the reasons is that a policy gradient learner has a simple design and strong theoretical properties in single-agent domains. Previously, Williams showed that the REINFORCE algorithm is a special case of policy gradient learning. He also showed that a learning automaton could be seen as a special case of the REINFORCE algorithm. Learning automata theory guarantees that a group of automata will converge to a stable equilibrium in team games. In this paper we will show a theoretical connection between learning automata and policy gradient methods to transfer this theoretical result to multi-agent policy gradient learning. An appropriate exploration technique is crucial for the convergence of a multi-agent system. Since learning automata are guaranteed to converge, they posses such an exploration. We identify the identical mapping of a learning automaton onto the Boltzmann exploration strategy with an suitable temperature setting. The novel idea is that the temperature of the Boltzmann function is not dependent on time but on the action probabilities of the agents.
机译:政策梯度方法是一种用于在代理系统中实现增强学习的流行技术。其中一个原因是,政策梯度学习者在单代理结构域中具有简单的设计和强大的理论特性。此前,威廉姆斯表明,钢筋算法是政策梯度学习的特殊情况。他还表明,学习自动机可以被视为强化算法的特殊情况。学习自动机理论保证了一组自动机将收敛到团队比赛中的稳定均衡。在本文中,我们将在学习自动机构和政策梯度方法之间表现出理论连接,以将这种理论结果转移到多代理政策梯度学习。适当的探索技术对于多助理系统的收敛性至关重要。由于学习自动机保证融合,因此他们拥有这样的探索。我们用合适的温度设定确定学习自动机上的相同映射到Boltzmann探索策略。新颖的想法是,Boltzmann功能的温度不依赖于时间,而是依赖于代理的动作概率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号