首页> 外国专利> Training a neural value network

Training a neural value network

机译:训练神经价值网络

摘要

A neural network training system includes one or more computers and one or more storage devices for storing instructions that, when executed by one or more computers, cause said computer (s) to perform operations to train a neural value network that serves to facilitate a network Receiving observation that characterizes the state of an environment interacting with an agent system and serving to process that observation according to the parameters of the neural value network to generate a score, the operations comprising: training a neural A supervised learning network network wherein the supervised learning neural network is used to receive the observation and to process that observation according to the neural network parameters with the supervised learning policy, for each action in a series of possible surveys Actions to generate a respective action probability that can be performed by the agent system to interact with the environment, and wherein training the neural network with policy for supervised learning, training the neural network with supervised learning policy with respect to labeled training data using includes the supervised learning policy to determine the trained parameter values of the neural network using the supervised learning policy; Initializing parameter initial values of a neural network with learning support policy having the same architecture as the neural network with supervised learning policy versus the trained parameter values of the neural network with the supervised learning policy; Training the neural network with learning support policy relating to the second training data generated by interactions of the agent system with a simulated version of the environment using the learning support to determine from the initial values the trained parameter values of the neural network with learning support policy; and training the neural value network to generate a value score for the state of the environment that represents a predicted long term reward that results from the state in the state by training the neural value network with respect to the third training data resulting from the interactions of the agent system with the simulated version of the environment were generated using the supervised learning policy to determine from the parameter initial values of the neural value network the trained parameter values of the neural value network.
机译:一种神经网络训练系统,包括一个或多个计算机和一个或多个存储设备,用于存储指令,当指令被一个或多个计算机执行时,使所述计算机执行训练神经价值网络的操作,该神经价值​​网络用于促进网络接收表征与代理系统交互的环境状态的观察结果,并根据神经价值网络的参数处理该观察结果以生成分数,该操作包括:训练神经A监督学习网络网络,其中监督学习神经网络用于接收观察结果,并在监督学习策略的指导下根据神经网络参数处理该观察结果,用于一系列可能的调查中的每个动作。动作产生相应的动作概率,该概率可由代理系统执行以执行以下操作:与环境互动,其中用po训练神经网络在监督学习的策略中,使用关于标记的训练数据的监督学习策略来训练神经网络。使用学习支持策略初始化神经网络的参数初始值,该策略与具有监督学习策略的神经网络具有相同的体系结构,而具有监督学习策略的神经网络的训练参数值具有相同的架构;使用学习支持策略来训练神经网络,该学习支持策略与第二学习数据有关,该第二训练数据是由代理系统与环境的模拟版本之间的交互所产生的,第二学习数据使用学习支持来从初始值中确定具有学习支持策略的神经网络的训练参数值;以及训练神经价值网络,以针对环境状态生成一个价值分数,该分数代表通过根据神经网络的相互作用产生的第三次训练数据对神经价值网络进行训练来表示该状态所产生的预测长期回报。使用监督学习策略生成具有模拟环境版本的代理系统,以便从神经价值网络的参数初始值确定神经价值网络的训练参数值。

著录项

  • 公开/公告号DE202016004627U1

    专利类型

  • 公开/公告日2016-09-23

    原文格式PDF

  • 申请/专利权人 GOOGLE INC.;

    申请/专利号DE20162004627U

  • 发明设计人

    申请日2016-07-27

  • 分类号G06N3/02;

  • 国家 DE

  • 入库时间 2022-08-21 14:08:55

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号