首页> 外国专利> PPO Multi-agent PPO Guided By The Best Local Policy

PPO Multi-agent PPO Guided By The Best Local Policy

机译：最佳本地策略指导的PPO多代理PPO

页面导航

摘要
著录项
相似文献

摘要

The present invention relates to a PPO algorithm using an efficient policy parameter search method guided from the policy of the best agent in a multi-agent system. A method of controlling training of a policy parameter of each of a plurality of agents, the policy training step of controlling each of the plurality of agents to independently train based on a shared guidance policy, the following for each training: Receiving information of each of the plurality of agents from a corresponding agent in order to obtain variables to be used in training, and transmitting variables to be used for training to the plurality of agents after being determined based on the information of each agent for each training , Predefined Receiving performance information of each agent from the corresponding agent as the training is performed, and controlling the plurality of agents to share the policy parameter of the best agent determined based on the received performance information of each agent. Can include.

机译：本发明涉及一种使用有效策略参数搜索方法的PPO算法，该策略参数搜索方法是在多主体系统中以最佳主体的策略为指导的。一种用于控制多个代理中的每个代理的策略参数的训练的方法，该策略训练步骤是基于共享的指导策略来控制多个代理中的每个代理独立地训练的，对于每个训练，以下是：接收每个的信息为了获得要在训练中使用的变量，从相应的座席中获取多个座席，并且在基于每次训练的每个座席的信息确定了预定的接收性能信息之后，将要用于训练的变量传输到多个座席。在执行训练时，从相应的代理中选择每个代理的代理，并控制多个代理以共享根据接收到的每个代理的性能信息确定的最佳代理的策略参数。可以包括。

著录项

公开/公告号KR102147017B1

专利类型
公开/公告日2020-08-21

原文格式PDF
申请/专利权人 한국과학기술원;
展开▼

申请/专利号KR20180103642
发明设计人 성영철;정휘영;
展开▼

申请日2018-08-31
分类号G06N3/08;G06N99;
国家 KR
入库时间 2022-08-21 11:03:58

相似文献

专利
外文文献
中文文献