首页> 外文期刊>Systems and Computers in Japan >Applying the Policy Gradient Method to Behavior Learning in Multiagent Systems: The Pursuit Problem
【24h】

Applying the Policy Gradient Method to Behavior Learning in Multiagent Systems: The Pursuit Problem

机译:策略梯度法在多主体系统行为学习中的应用:追求问题

获取原文
获取原文并翻译 | 示例
           

摘要

In the field of multiagent systems, some methods use the policy gradient method for behavior learning. In these methods, the learning problem in the multiagent system is reduced to each agent's independent learning problem by adopting an autonomous distributed behavior determination method. That is, a probabilistic policy that contains parameters is used as the policy of each agent, and the parameters are updated while calculating the maximum gradient so as to maximize the expectation value of the reward. In this paper, first, recognizing the action determination problem at each time step to be a minimization problem for some objective function, the Boltzmann distribution, in which this objective function is the energy function, was adopted as the probabilistic policy. Next, we showed that this objective function can be expressed by such terms as the value of the state, the state action rule, and the potential. Further, as a result of an experiment applying this method to a pursuit problem, good policy was obtained and this method was found to be flexible so that it can be adapted to use of heuristics and to modification of behavioral constraint and objective in the policy.
机译:在多主体系统领域,一些方法使用策略梯度法进行行为学习。在这些方法中,通过采用自主的分布式行为确定方法,将多主体系统中的学习问题简化为每个主体的独立学习问题。即,将包含参数的概率策略用作每个代理的策略,并且在计算最大梯度的同时更新参数以最大化奖励的期望值。在本文中,首先,将每个时间步长的动作确定问题识别为某个目标函数的最小化问题,采用以目标函数为能量函数的玻尔兹曼分布作为概率策略。接下来,我们证明了该目标函数可以用诸如状态值,状态动作规则和电位之类的术语表示。此外,作为将该方法应用于追踪问题的实验的结果,获得了良好的策略,并且发现该方法具有灵活性,因此可以适应启发式方法的使用以及策略中行为约束和目标的修改。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号