首页> 外文期刊>Neurocomputing >PALO bounds for reinforcement learning in partially observable stochastic games
【24h】

PALO bounds for reinforcement learning in partially observable stochastic games

机译:Palo界限为部分可观察到的随机游戏中的加固学习

获取原文
获取原文并翻译 | 示例

摘要

A partially observable stochastic game (POSG) is a general model for multiagent decision making under uncertainty. Perkins' Monte Carlo exploring starts for partially observable Markov decision process (POMDP) (MCES-P) integrates Monte Carlo exploring starts (MCES) into a local search of the policy space to offer an elegant template for model-free reinforcement learning in POSGs. However, multiagent reinforcement learning in POSGs is tremendously more complex than in single agent settings due to the heterogeneity of agents and discrepancy of their goals. In this article, we generalize reinforcement learning under partial observability to self-interested and cooperative multiagent settings under the POSG umbrella. We present three new templates for multiagent reinforcement learning in POSGs. MCES for interactive POMDP (MCESIP) extends MCES-P by maintaining predictions of the other agent's actions based on dynamic beliefs over models. MCES for multiagent POMDP (MCES-MP) generalizes MCES-P to the canonical multiagent POMDP framework, with a single policy mapping joint observations of all agents to joint actions. Finally, MCES for factored-reward multiagent POMDP (MCES-FMP) has each agent individually mapping joint observations to their own action. We use probabilistic approximate locally optimal (PALO) bounds to analyze sample complexity, thereby instantiating these templates to PALO learning. We promote sample efficiency by including a policy space pruning technique and evaluate the approaches on six benchmark domains as well as compare with the state-of-the-art techniques, which demonstrates that MCES-IP and MCES-FMP yield improved policies with fewer samples compared to the previous baselines. (C) 2020 Elsevier B.V. All rights reserved.
机译:部分可观察到的随机游戏(POSG)是在不确定性下多算法决策的一般模型。 Perkins的Monte Carlo探索开始为部分观察到的Markov决策过程(POMDP)(MCE-P)集成了Monte Carlo探索开始(MCE)进入当地搜索政策空间,为POSGS提供了优雅的无模型增强学习模板。然而,由于代理商的异质性和目标目标差异,POSGS中的多层钢筋学习比单一代理设置更复杂。在本文中,我们概括了在POSG伞下的局部可观测性下的局部可观察性的加强学习。我们为POSGS提供了三种新模板,用于多级钢筋学习。用于交互式POMDP的MCE(MCEIP)通过维持基于模型的动态信念的其他代理的动态的预测来扩展MCE-P. MCE用于多眼POMDP(MCES-MP)将MCE-P推广到规范的多验POMDP框架,并对所有代理的单一策略映射联合行动的联合观察。最后,用于考核 - 奖励多书的MCE(MCES-FMP)的每个代理商都有单独绘制联合观察的行动。我们使用概率近似局部最佳(Palo)界限来分析样本复杂性,从而将这些模板实例化到Palo学习。我们通过包括一项政策空间修剪技术来促进样本效率,并评估六个基准域的方法以及与最先进的技术相比,这表明MCE-IP和MCE-FMP产生了更少的样本的改善的政策与以前的基线相比。 (c)2020 Elsevier B.v.保留所有权利。

著录项

  • 来源
    《Neurocomputing》 |2021年第8期|36-56|共21页
  • 作者单位

    Univ Georgia Dept Comp Sci THINC Lab Athens GA 30602 USA;

    Univ Georgia Dept Comp Sci THINC Lab Athens GA 30602 USA;

    Univ Georgia Dept Comp Sci THINC Lab Athens GA 30602 USA;

    Univ Southern Mississippi Sch Comp Sci & Comp Engn Hattiesburg MS 39406 USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Multiagent systems; Reinforcement learning; POMDP; POSG;

    机译:多元素系统;加固学习;POMDP;POSG;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号