首页> 外国专利> APPARATUS FOR Q-LEARNING FOR CONTINUOUS ACTIONS WITH CROSS-ENTROPY GUIDED POLICIES AND METHOD THEREOF

APPARATUS FOR Q-LEARNING FOR CONTINUOUS ACTIONS WITH CROSS-ENTROPY GUIDED POLICIES AND METHOD THEREOF

机译:具有交叉熵指导策略的连续动作的Q学习装置及其方法

摘要

An apparatus for performing continuous actions includes a memory storing instructions, and a processor configured to execute the instructions to obtain a first action of an agent, based on a current state of the agent, using a cross-entropy guided policy (CGP) neural network, and control to perform the obtained first action. The CGP neural network is trained using a cross-entropy method (CEM) policy neural network for obtaining a second action of the agent based on an input state of the agent, and the CEM policy neural network is trained using a CEM and trained separately from the training of the CGP neural network.
机译:一种用于执行连续动作的设备,包括:存储器,其存储指令;以及处理器,其被配置为使用跨熵引导策略(CGP)神经网络,基于所述代理的当前状态,执行所述指令以获得代理的第一动作。 ,并控制执行获得的第一个动作。使用交叉熵方法(CEM)策略神经网络对CGP神经网络进行训练,以基于代理的输入状态获得代理的第二动作,并且使用CEM对CGP策略神经网络进行训练,并与CGP神经网络的训练。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号