首页> 外国专利> TRAINING A POLICY NEURAL NETWORK AND A VALUE NEURAL NETWORK

TRAINING A POLICY NEURAL NETWORK AND A VALUE NEURAL NETWORK

机译：训练政策神经网络和价值神经网络

页面导航

摘要
著录项
相似文献

摘要

Methods, systems and apparatus, including computer programs encoded on computer storage media, for training a value neural network that is configured to receive an observation characterizing a state of an environment being interacted with by an agent and to process the observation in accordance with parameters of the value neural network to generate a value score. One of the systems performs operations that include training a supervised learning policy neural network; initializing initial values of parameters of a reinforcement learning policy neural network having a same architecture as the supervised learning policy network to the trained values of the parameters of the supervised learning policy neural network; training the reinforcement learning policy neural network on second training data; and training the value neural network to generate a value score for the state of the environment that represents a predicted long-term reward resulting from the environment being in the state.

机译：方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于训练值神经网络，该值神经网络被配置为接收表征表征与代理交互的环境状态的观察值，并根据以下参数处理该观察值：价值神经网络生成价值得分。其中一个系统执行的操作包括训练有监督的学习策略神经网络;将具有与监督学习策略网络相同的架构的强化学习策略神经网络的参数的初始值初始化为监督学习策略神经网络的参数的训练值;在第二训练数据上训练强化学习策略神经网络;训练价值神经网络以生成环境状态的价值评分，该评分代表由环境处于该状态导致的预测的长期奖励。

著录项

公开/公告号US2018032863A1

专利类型
公开/公告日2018-02-01

原文格式PDF
申请/专利权人 GOOGLE INC.;
展开▼

申请/专利号US201615280711
发明设计人 THORE KURT HARTWIG GRAEPEL;SHIH-CHIEH HUANG;DAVID SILVER;ARTHUR CLEMENT GUEZ;LAURENT SIFRE;ILYA SUTSKEVER;CHRISTOPHER MADDISON;
展开▼

申请日2016-09-29
分类号G06N3/08;G06N3/04;
国家 US
入库时间 2022-08-21 12:59:51

相似文献

专利
外文文献
中文文献