首页> 外国专利> TRAINING ACTION SELECTION NEURAL NETWORKS USING A DIFFERENTIABLE CREDIT FUNCTION

TRAINING ACTION SELECTION NEURAL NETWORKS USING A DIFFERENTIABLE CREDIT FUNCTION

机译：使用不同的信用函数进行训练动作选择神经网络

页面导航

摘要
著录项
相似文献

摘要

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning. A reinforcement learning neural network selects actions to be performed by an agent interacting with an environment to perform a task in an attempt to achieve a specified result. The reinforcement learning neural network has at least one input to receive an input observation characterizing a state of the environment and at least one output for determining an action to be performed by the agent in response to the input observation. The system includes a reward function network coupled to the reinforcement learning neural network. The reward function network has an input to receive reward data characterizing a reward provided by one or more states of the environment and is configured to determine a reward function to provide one or more target values for training the reinforcement learning neural network.

机译：用于加强学习的方法，系统和装置，包括编码在计算机存储介质上的计算机程序。强化学习神经网络选择要由代理与环境交互以执行任务以尝试达到指定结果的动作。强化学习神经网络具有至少一个输入，以接收表征环境状态的输入观察，以及至少一个输出，用于响应于该输入观察来确定由代理执行的动作。该系统包括耦合到强化学习神经网络的奖励功能网络。奖励函数网络具有输入以接收表征由环境的一个或多个状态提供的奖励的奖励数据，并且被配置为确定奖励函数以提供用于训练强化学习神经网络的一个或多个目标值。

著录项

公开/公告号EP3593289A1

专利类型
公开/公告日2020-01-15

原文格式PDF
申请/专利权人 DEEPMIND TECHNOLOGIES LIMITED;
展开▼

申请/专利号EP20180726144
发明设计人 XU ZHONGWEN;HASSELT HADO PHILIP VAN;MODAYIL JOSEPH VARUGHESE;DA MOTTA BARRETO ANDRE;SILVER DAVID;
展开▼

申请日2018-05-22
分类号G06N3/04;G06N3/08;G06N3;
国家 EP
入库时间 2022-08-21 11:39:39

相似文献

专利
外文文献
中文文献