首页>
外国专利>
PPO Multi-agent PPO Guided By The Best Local Policy
PPO Multi-agent PPO Guided By The Best Local Policy
展开▼
机译:最佳本地策略指导的PPO多代理PPO
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention relates to a PPO algorithm using an efficient policy parameter search method guided from the policy of the best agent in a multi-agent system. A method of controlling training of a policy parameter of each of a plurality of agents, the policy training step of controlling each of the plurality of agents to independently train based on a shared guidance policy, the following for each training: Receiving information of each of the plurality of agents from a corresponding agent in order to obtain variables to be used in training, and transmitting variables to be used for training to the plurality of agents after being determined based on the information of each agent for each training , Predefined Receiving performance information of each agent from the corresponding agent as the training is performed, and controlling the plurality of agents to share the policy parameter of the best agent determined based on the received performance information of each agent. Can include.
展开▼