首页> 外国专利> REINFORCEMENT LEARNING WITH ADAPTIVE RETURN COMPUTATION SCHEMES

REINFORCEMENT LEARNING WITH ADAPTIVE RETURN COMPUTATION SCHEMES

机译:具有自适应退回计算方案的加固学习

摘要

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning with adaptive return computation schemes. In one aspect, a method includes: maintaining data specifying a policy for selecting between multiple different return computation schemes, each return computation scheme assigning a different importance to exploring the environment while performing an episode of a task; selecting, using the policy, a return computation scheme from the multiple different return computation schemes; controlling an agent to perform the episode of the task to maximize a return computed according to the selected return computation scheme; identifying rewards that were generated as a result of the agent performing the episode of the task; and updating, using the identified rewards, the policy for selecting between multiple different return computation schemes.
机译:方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于利用自适应返回计算方案的增强学习。 在一个方面,一种方法包括:维护指定用于在多个不同返回计算方案之间选择的策略的数据,每个返回计算方案分配不同重要性以在执行任务的剧集时探索环境; 选择,使用策略,来自多个不同返回计算方案的返回计算方案; 控制代理以执行任务的集,以最大化根据所选返回计算方案计算的返回; 识别作为执行任务集的代理生成的奖励; 并使用已识别的奖励更新,策略选择多个不同返回计算方案之间的选择。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号