首页> 外国专利> LEARNING OPTIONS FOR ACTION SELECTION WITH META-GRADIENTS IN MULTI-TASK REINFORCEMENT LEARNING

LEARNING OPTIONS FOR ACTION SELECTION WITH META-GRADIENTS IN MULTI-TASK REINFORCEMENT LEARNING

机译：多任务强化学习中的Meta-梯度学习选项

页面导航

摘要
著录项
相似文献

摘要

A reinforcement learning system, method, and computer program code for controlling an agent to perform a plurality of tasks while interacting with an environment. The system learns options, where an option comprises a sequence of primitive actions performed by the agent under control of an option policy neural network. In implementations the system discovers options which are useful for multiple different tasks by meta-learning rewards for training the option policy neural network whilst the agent is interacting with the environment.

机译：用于控制代理的加强学习系统，方法和计算机程序代码在与环境交互的同时执行多个任务。系统学习选项，其中一个选项包括由代理在选项策略神经网络的控制下执行的代理执行的原始动作序列。在实现中，系统发现了通过元学习奖励来训练选项策略神经网络的多个不同任务的选项，同时代理与环境交互。

著录项

公开/公告号WO2021245286A1

专利类型
公开/公告日2021-12-09

原文格式PDF
申请/专利权人 DEEPMIND TECHNOLOGIES LIMITED;
展开▼

申请/专利号WO2021EP65124
发明设计人 JEYA VEERAIAH VIVEK VEERIAH;ZAHAVY TOM BEN ZION;HESSEL MATTEO;XU ZHONGWEN;OH JUNHYUK;KEMEAV IURII;VAN HASSELT HADO PHILIP;SILVER DAVID;BAVEJA SATINDER SINGH;
展开▼

申请日2021-06-07
分类号G06N3/08;G06N3/04;
国家 EP
入库时间 2022-08-24 22:45:22

相似文献

专利
外文文献
中文文献