首页> 外国专利> Training policy neural networks using path consistency learning

Training policy neural networks using path consistency learning

机译:使用路径一致性学习训练策略神经网络

摘要

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network used to select actions to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes obtaining path data defining a path through the environment traversed by the agent. A consistency error is determined for the path from a combined reward, first and last soft-max state values, and a path likelihood. A value update for the current values of the policy neural network parameters is determined from at least the consistency error. The value update is used to adjust the current values of the policy neural network parameters.
机译:方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于训练策略神经网络,该策略神经网络用于选择要由与环境交互的强化学习代理执行的动作。在一个方面,一种方法包括获得路径数据,该路径数据定义了通过代理所遍历的环境的路径。从组合的奖励,第一个和最后一个soft-max状态值以及路径可能性确定路径的一致性误差。至少从一致性误差中确定策略神经网络参数的当前值的值更新。值更新用于调整策略神经网络参数的当前值。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号