首页> 外国专利> CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING.

CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING.

机译：具有深度强化学习功能的连续控制。

页面导航

摘要
著录项
相似文献

摘要

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updating current values of the parameters of the actor neural network, comprising: for each experience tuple in the minibatch: processing the training observation and the training action in the experience tuple using a critic neural network to determine a neural network output for the experience tuple, and determining a target neural network output for the experience tuple; updating current values of the parameters of the critic neural network using errors between the target neural network outputs and the neural network outputs; and updating the current values of the parameters of the actor neural network using the critic neural network.

机译：方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于训练演员神经网络，该演员神经网络用于选择要由与环境交互的代理执行的动作。其中一种方法包括获取经验元组的小批量;以及更新角色神经网络的参数的当前值，包括：对于小批量中的每个经验元组：使用评论者神经网络处理经验元组中的训练观察和训练动作，以确定经验元组的神经网络输出，并为体验元组确定目标神经网络输出;使用目标神经网络输出和神经网络输出之间的误差来更新评论者神经网络的参数的当前值;使用评论者神经网络更新演员神经网络参数的当前值。

著录项

公开/公告号MX2018000942A

专利类型
公开/公告日2018-08-09

原文格式PDF
申请/专利权人 DEEPMIND TECHNOLOGIES LIMITED;
展开▼

申请/专利号MX20180000942
发明设计人 TIMOTHY PAUL LILLICRAP;ALEXANDER PRITZEL;NICOLAS MANFRED OTTO HEESS;TOM EREZ;DANIEL PIETER WIERSTRA;YUVAL TASSA;DAVID SILVER;JONATHAN JAMES HUNT;
展开▼

申请日2016-07-22
分类号G06N3/04;
国家 MX
入库时间 2022-08-21 12:51:07

相似文献

专利
外文文献
中文文献