首页> 外国专利> TRAINING ACTION SELECTION NEURAL NETWORKS USING A DIFFERENTIABLE CREDIT FUNCTION

TRAINING ACTION SELECTION NEURAL NETWORKS USING A DIFFERENTIABLE CREDIT FUNCTION

机译:使用不同的信用函数进行训练动作选择神经网络

摘要

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning. A reinforcement learning neural network selects actions to be performed by an agent interacting with an environment to perform a task in an attempt to achieve a specified result. The reinforcement learning neural network has at least one input to receive an input observation characterizing a state of the environment and at least one output for determining an action to be performed by the agent in response to the input observation. The system includes a reward function network coupled to the reinforcement learning neural network. The reward function network has an input to receive reward data characterizing a reward provided by one or more states of the environment and is configured to determine a reward function to provide one or more target values for training the reinforcement learning neural network.
机译:用于加强学习的方法,系统和装置,包括编码在计算机存储介质上的计算机程序。强化学习神经网络选择要由代理与环境交互以执行任务以尝试达到指定结果的动作。强化学习神经网络具有至少一个输入,以接收表征环境状态的输入观察,以及至少一个输出,用于响应于该输入观察来确定由代理执行的动作。该系统包括耦合到强化学习神经网络的奖励功能网络。奖励函数网络具有输入以接收表征由环境的一个或多个状态提供的奖励的奖励数据,并且被配置为确定奖励函数以提供用于训练强化学习神经网络的一个或多个目标值。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号