首页> 外国专利> TRAINING ACTION SELECTION NEURAL NETWORKS USING OFF-POLICY ACTOR CRITIC REINFORCEMENT LEARNING

TRAINING ACTION SELECTION NEURAL NETWORKS USING OFF-POLICY ACTOR CRITIC REINFORCEMENT LEARNING

机译:运用非政治因素批判性强化学习的训练动作选择神经网络

摘要

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.
机译:用于训练动作选择神经网络的方法,系统和装置,包括编码在计算机存储介质上的计算机程序。该方法之一包括维持重放存储器,该重放存储器存储由于代理与环境的交互而产生的轨迹。训练在重放存储器中的轨迹上具有策略参数的动作选择神经网络,其中,训练动作选择神经网络包括:从重放存储器中采样轨迹;通过使用偏离策略的演员批评家强化学习技术在轨迹上训练动作选择神经网络来调整策略参数的当前值。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号