首页> 外国专利> REINFORCEMENT LEARNING WITH ADAPTIVE RETURN COMPUTATION SCHEMES

REINFORCEMENT LEARNING WITH ADAPTIVE RETURN COMPUTATION SCHEMES

机译：具有自适应退回计算方案的加固学习

页面导航

摘要
著录项
相似文献

摘要

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning with adaptive return computation schemes. In one aspect, a method includes: maintaining data specifying a policy for selecting between multiple different return computation schemes, each return computation scheme assigning a different importance to exploring the environment while performing an episode of a task; selecting, using the policy, a return computation scheme from the multiple different return computation schemes; controlling an agent to perform the episode of the task to maximize a return computed according to the selected return computation scheme; identifying rewards that were generated as a result of the agent performing the episode of the task; and updating, using the identified rewards, the policy for selecting between multiple different return computation schemes.

机译：方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于利用自适应返回计算方案的增强学习。在一个方面，一种方法包括：维护指定用于在多个不同返回计算方案之间选择的策略的数据，每个返回计算方案分配不同重要性以在执行任务的剧集时探索环境; 选择，使用策略，来自多个不同返回计算方案的返回计算方案; 控制代理以执行任务的集，以最大化根据所选返回计算方案计算的返回; 识别作为执行任务集的代理生成的奖励; 并使用已识别的奖励更新，策略选择多个不同返回计算方案之间的选择。

著录项

公开/公告号WO2021156518A1

专利类型
公开/公告日2021-08-12

原文格式PDF
申请/专利权人 DEEPMIND TECHNOLOGIES LIMITED;
展开▼

申请/专利号WO2021EP52988
发明设计人 BADIA ADRIÀ PUIGDOMÈNECH;PIOT BILAL;SPRECHMANN PABLO;KAPTUROWSKI STEVEN JAMES;VITVITSKYI ALEX;GUO ZHAOHAN;BLUNDELL CHARLES;
展开▼

申请日2021-02-08
分类号G06N3;G06N3/04;G06N3/08;G06N7;
国家 EP
入库时间 2022-08-24 20:36:24

相似文献

专利
外文文献
中文文献