首页> 外国专利> Training policy neural networks using path consistency learning

Training policy neural networks using path consistency learning

机译：使用路径一致性学习训练策略神经网络

页面导航

摘要
著录项
相似文献

摘要

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network used to select actions to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes obtaining path data defining a path through the environment traversed by the agent. A consistency error is determined for the path from a combined reward, first and last soft-max state values, and a path likelihood. A value update for the current values of the policy neural network parameters is determined from at least the consistency error. The value update is used to adjust the current values of the policy neural network parameters.

机译：方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于训练策略神经网络，该策略神经网络用于选择要由与环境交互的强化学习代理执行的动作。在一个方面，一种方法包括获得路径数据，该路径数据定义了通过代理所遍历的环境的路径。从组合的奖励，第一个和最后一个soft-max状态值以及路径可能性确定路径的一致性误差。至少从一致性误差中确定策略神经网络参数的当前值的值更新。值更新用于调整策略神经网络参数的当前值。

著录项

公开/公告号US10733502B2

专利类型
公开/公告日2020-08-04

原文格式PDF
申请/专利权人 GOOGLE LLC;
展开▼

申请/专利号US201916504934
发明设计人 OFIR NACHUM;MOHAMMAD NOROUZI;DALE ERIC SCHUURMANS;KELVIN XU;
展开▼

申请日2019-07-08
分类号G06N3/04;G06N3/08;
国家 US
入库时间 2022-08-21 11:27:00

相似文献

专利
外文文献
中文文献