首页> 外国专利> CONSTRAINED REINFORCEMENT LEARNING NEURAL NETWORK SYSTEMS USING PARETO FRONT OPTIMIZATION

CONSTRAINED REINFORCEMENT LEARNING NEURAL NETWORK SYSTEMS USING PARETO FRONT OPTIMIZATION

机译:基于PARETO前沿优化的约束强化学习神经网络系统

摘要

A system and method that controls an agent to perform a task subject to one or more constraints. The system trains a preference neural network that learns which preferences produce constraint-satisfying action selection policies. Thus the system optimizes a hierarchical policy that is a product of a preference policy and a preference-conditioned action selection policy. Thus the system learns to jointly optimize a set of objectives relating to rewards and costs received during the task whilst also learning preferences, i.e. trade-offs between the rewards and costs, that are most likely to produce policies that satisfy the constraints.
机译:一种系统和方法,用于控制代理执行受一个或多个约束的任务。该系统训练偏好神经网络,学习哪些偏好产生满足约束的行为选择策略。因此,系统优化了作为偏好策略和偏好条件动作选择策略的产物的分层策略。因此,系统学习如何联合优化与任务期间获得的奖励和成本相关的一组目标,同时也学习最有可能产生满足约束的政策的偏好,即奖励和成本之间的权衡。

著录项

  • 公开/公告号WO2022069743A1

    专利类型

  • 公开/公告日2022-04-07

    原文格式PDF

  • 申请/专利权人 DEEPMIND TECHNOLOGIES LIMITED;

    申请/专利号WO2021EP77177

  • 发明设计人 HUANG SANDY HAN;ABDOLMALEKI ABBAS;

    申请日2021-10-01

  • 分类号G06N3;G06N3/04;G06N3/08;

  • 国家 EP

  • 入库时间 2022-08-25 00:24:44

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号