首页> 外国专利> REINFORCEMENT LEARNING WITH AUXILIARY TASKS

REINFORCEMENT LEARNING WITH AUXILIARY TASKS

机译：辅助任务的强化学习

页面导航

摘要
著录项
相似文献

摘要

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.

机译：用于训练强化学习系统的方法，系统和装置，包括在计算机存储介质上编码的计算机程序。该方法包括：训练动作选择策略神经网络，以及在动作选择神经网络的训练期间，训练一个或多个辅助控制神经网络和奖励预测神经网络。每个辅助控制神经网络被配置为接收由动作选择策略神经网络生成的相应中间输出，并生成用于对应的辅助控制任务的策略输出。奖励预测神经网络被配置为接收由动作选择策略神经网络生成的一个或多个中间输出，并生成相应的预测奖励。训练每个辅助控制神经网络和奖励预测神经网络包括调整各个辅助控制参数，奖励预测参数和动作选择策略网络参数的值。

著录项

公开/公告号EP3535705A1

专利类型
公开/公告日2019-09-11

原文格式PDF
申请/专利权人 DEEPMIND TECHNOLOGIES LIMITED;
展开▼

申请/专利号EP20170808163
发明设计人 MNIH VOLODYMYR;CZARNECKI WOJCIECH;JADERBERG MAXWELL ELLIOT;SCHAUL TOM;SILVER DAVID;KAVUKCUOGLU KORAY;
展开▼

申请日2017-11-04
分类号G06N3/04;G06N3/08;
国家 EP
入库时间 2022-08-21 12:29:56

相似文献

专利
外文文献
中文文献