首页> 外国专利> DISTRIBUTED TRAINING USING OFF-POLICY ACTOR-CRITIC REINFORCEMENT LEARNING

DISTRIBUTED TRAINING USING OFF-POLICY ACTOR-CRITIC REINFORCEMENT LEARNING

机译:利用非政策性行为者批评式强化学习进行分布式培训

摘要

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.
机译:方法,系统和装置,包括编码在计算机存储介质上的计算机程序,用于训练动作选择神经网络,该动作选择神经网络用于选择要由与环境交互的代理执行的动作。在一个方面,一种系统包括多个演员计算单元和多个学习者计算单元。参与者计算单元生成体验元组轨迹,学习者计算单元使用该体验元组轨迹来使用强化学习技术来更新学习者动作选择神经网络参数。强化学习技术可以是脱离政策的演员批评家强化学习技术。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号