Applying the Policy Gradient Method to Behavior Learning in Multiagent Systems: The Pursuit Problem

Seiji Ishihara; Harukazu Igarashi

首页> 外文期刊>Systems and Computers in Japan >Applying the Policy Gradient Method to Behavior Learning in Multiagent Systems: The Pursuit Problem

【24h】

Applying the Policy Gradient Method to Behavior Learning in Multiagent Systems: The Pursuit Problem

机译：策略梯度法在多主体系统行为学习中的应用：追求问题

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the field of multiagent systems, some methods use the policy gradient method for behavior learning. In these methods, the learning problem in the multiagent system is reduced to each agent's independent learning problem by adopting an autonomous distributed behavior determination method. That is, a probabilistic policy that contains parameters is used as the policy of each agent, and the parameters are updated while calculating the maximum gradient so as to maximize the expectation value of the reward. In this paper, first, recognizing the action determination problem at each time step to be a minimization problem for some objective function, the Boltzmann distribution, in which this objective function is the energy function, was adopted as the probabilistic policy. Next, we showed that this objective function can be expressed by such terms as the value of the state, the state action rule, and the potential. Further, as a result of an experiment applying this method to a pursuit problem, good policy was obtained and this method was found to be flexible so that it can be adapted to use of heuristics and to modification of behavioral constraint and objective in the policy.

机译：在多主体系统领域，一些方法使用策略梯度法进行行为学习。在这些方法中，通过采用自主的分布式行为确定方法，将多主体系统中的学习问题简化为每个主体的独立学习问题。即，将包含参数的概率策略用作每个代理的策略，并且在计算最大梯度的同时更新参数以最大化奖励的期望值。在本文中，首先，将每个时间步长的动作确定问题识别为某个目标函数的最小化问题，采用以目标函数为能量函数的玻尔兹曼分布作为概率策略。接下来，我们证明了该目标函数可以用诸如状态值，状态动作规则和电位之类的术语表示。此外，作为将该方法应用于追踪问题的实验的结果，获得了良好的策略，并且发现该方法具有灵活性，因此可以适应启发式方法的使用以及策略中行为约束和目标的修改。

著录项

来源
《Systems and Computers in Japan》 |2006年第10期|共9页
作者
Seiji Ishihara; Harukazu Igarashi;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Reinforcement learning; Policy gradient method; Pursuit problem; Multiagent system;

机译：强化学习;策略梯度法;追踪问题;多主体系统;

相似文献

外文文献
中文文献
专利

1. Applying the Policy Gradient Method to Behavior Learning in Multiagent Systems: The Pursuit Problem [J] . Seiji Ishihara, Harukazu Igarashi Systems and Computers in Japan . 2006,第10期

机译：策略梯度法在多主体系统行为学习中的应用：追求问题
2. Policy gradient method in multi-agent systems - pursuit problem [J] . Seiji Ishihara, Harukazu Igarashi 電子情報通信学会技術研究報告. 人工知能と知識処理. Artificial Intelligence and Knowledge Based Processing . 2002,第615期

机译：多主体系统中的策略梯度方法-追踪问题
3. Policy gradient method in multi-agent systems - pursuit problem [J] . Seiji Ishihara, Harukazu Igarashi 電子情報通信学会技術研究報告. 人工知能と知識処理. Artificial Intelligence and Knowledge Based Processing . 2002,第615期

机译：多代理系统中的政策梯度方法 - 追求问题
4. Policy Gradient Methods in Multi-Agent Systems - Pursuit Problem [C] . Seiji ISHIHARA, Harukazu IGARASHI International Conference on Hybrid Intelligent Systems . 2003

机译：多代理系统中的政策梯度方法 - 追求问题
5. Explaining Collective Behavior with Dynamical Systems: Spatial Gradient Sensing in Eukaryotic Chemotaxis and Learning Dynamics in Multiagent Reinforcement Learning [D] . Shams, Daniel . 2019

机译：用动力系统解释集体行为：多核化趋化性的空间梯度传感和多核强化学习中的学习动态
6. Correction: Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail [O] . Eleni Vasilaki, Nicolas Frémaux, Robert Urbanczik, 2009

机译：更正：在连续状态和动作空间中基于峰值的强化学习：当策略梯度方法失败时
7. Applying statistical methods in knowledge management of a multiagent system [O] . Kohut Ondřej, Košinár Michal 2010

机译：统计方法在多主体系统知识管理中的应用

Applying the Policy Gradient Method to Behavior Learning in Multiagent Systems: The Pursuit Problem

摘要

著录项

相似文献

相关主题

期刊订阅