首页>
外国专利>
Training policy neural networks using path consistency learning
Training policy neural networks using path consistency learning
展开▼
机译:使用路径一致性学习训练策略神经网络
展开▼
页面导航
摘要
著录项
相似文献
摘要
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network used to select actions to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes obtaining path data defining a path through the environment traversed by the agent. A consistency error is determined for the path from a combined reward, first and last soft-max state values, and a path likelihood. A value update for the current values of the policy neural network parameters is determined from at least the consistency error. The value update is used to adjust the current values of the policy neural network parameters.
展开▼