PALO bounds for reinforcement learning in partially observable stochastic games

Ceren Roi; He Keyang; Doshi Prashant; Banerjee Bikramjit

首页> 外文期刊>Neurocomputing >PALO bounds for reinforcement learning in partially observable stochastic games

【24h】

PALO bounds for reinforcement learning in partially observable stochastic games

机译：Palo界限为部分可观察到的随机游戏中的加固学习

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A partially observable stochastic game (POSG) is a general model for multiagent decision making under uncertainty. Perkins' Monte Carlo exploring starts for partially observable Markov decision process (POMDP) (MCES-P) integrates Monte Carlo exploring starts (MCES) into a local search of the policy space to offer an elegant template for model-free reinforcement learning in POSGs. However, multiagent reinforcement learning in POSGs is tremendously more complex than in single agent settings due to the heterogeneity of agents and discrepancy of their goals. In this article, we generalize reinforcement learning under partial observability to self-interested and cooperative multiagent settings under the POSG umbrella. We present three new templates for multiagent reinforcement learning in POSGs. MCES for interactive POMDP (MCESIP) extends MCES-P by maintaining predictions of the other agent's actions based on dynamic beliefs over models. MCES for multiagent POMDP (MCES-MP) generalizes MCES-P to the canonical multiagent POMDP framework, with a single policy mapping joint observations of all agents to joint actions. Finally, MCES for factored-reward multiagent POMDP (MCES-FMP) has each agent individually mapping joint observations to their own action. We use probabilistic approximate locally optimal (PALO) bounds to analyze sample complexity, thereby instantiating these templates to PALO learning. We promote sample efficiency by including a policy space pruning technique and evaluate the approaches on six benchmark domains as well as compare with the state-of-the-art techniques, which demonstrates that MCES-IP and MCES-FMP yield improved policies with fewer samples compared to the previous baselines. (C) 2020 Elsevier B.V. All rights reserved.

机译：部分可观察到的随机游戏（POSG）是在不确定性下多算法决策的一般模型。 Perkins的Monte Carlo探索开始为部分观察到的Markov决策过程（POMDP）（MCE-P）集成了Monte Carlo探索开始（MCE）进入当地搜索政策空间，为POSGS提供了优雅的无模型增强学习模板。然而，由于代理商的异质性和目标目标差异，POSGS中的多层钢筋学习比单一代理设置更复杂。在本文中，我们概括了在POSG伞下的局部可观测性下的局部可观察性的加强学习。我们为POSGS提供了三种新模板，用于多级钢筋学习。用于交互式POMDP的MCE（MCEIP）通过维持基于模型的动态信念的其他代理的动态的预测来扩展MCE-P. MCE用于多眼POMDP（MCES-MP）将MCE-P推广到规范的多验POMDP框架，并对所有代理的单一策略映射联合行动的联合观察。最后，用于考核 - 奖励多书的MCE（MCES-FMP）的每个代理商都有单独绘制联合观察的行动。我们使用概率近似局部最佳（Palo）界限来分析样本复杂性，从而将这些模板实例化到Palo学习。我们通过包括一项政策空间修剪技术来促进样本效率，并评估六个基准域的方法以及与最先进的技术相比，这表明MCE-IP和MCE-FMP产生了更少的样本的改善的政策与以前的基线相比。（c）2020 Elsevier B.v.保留所有权利。

著录项

来源
《Neurocomputing》 |2021年第8期|36-56|共21页
作者
Ceren Roi; He Keyang; Doshi Prashant; Banerjee Bikramjit;
展开▼
作者单位

Univ Georgia Dept Comp Sci THINC Lab Athens GA 30602 USA;

Univ Georgia Dept Comp Sci THINC Lab Athens GA 30602 USA;

Univ Georgia Dept Comp Sci THINC Lab Athens GA 30602 USA;

Univ Southern Mississippi Sch Comp Sci & Comp Engn Hattiesburg MS 39406 USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Multiagent systems; Reinforcement learning; POMDP; POSG;

机译：多元素系统;加固学习;POMDP;POSG;

相似文献

外文文献
中文文献
专利

1. Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation [J] . Hajime Fujita, Shin Ishii Neural computation . 2007,第11期

机译：具有基于采样状态估计的部分可观察游戏的基于模型的强化学习
2. A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game [J] . SHIN ISHH, HAJIME FUJITA, MASAOKI MITSUTAKE, Machine Learning . 2005,第1a2期

机译：部分可观察的多智能体游戏的强化学习方案
3. Multi-task Reinforcement Learning in Partially Observable Stochastic Environments [J] . Li Hui, Liao Xuejun, Carin Lawrence Journal of machine learning research . 2009,第May期

机译：部分可观察的随机环境中的多任务强化学习
4. Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds [C] . Roi Ceren, Prashant Doshi, Bikramjit Banerjee International Conference on Autonomous Agents and Multiagent Systems . 2017

机译：在部分可观察到的多层设置中的加固学习：Monte Carlo探索PAC界的政策
5. Reinforcement learning in stochastic games against bounded memory opponents. [D] . Vrljicak, Tomislav. 2006

机译：针对随机记忆对手的随机游戏中的强化学习。
6. Multi-agent reinforcement learning with approximate model learning for competitive games [O] . Young Joon Park, Yoon Sang Cho, Seoung Bum Kim 2012

机译：多主体强化学习和近似模型学习的竞技游戏
7. PALO bounds for reinforcement learning in partially observable stochastic games [O] . Roi Ceren, Keyang He, Prashant Doshi, 2021

机译：Palo界限为部分可观察到的随机游戏中的加固学习

PALO bounds for reinforcement learning in partially observable stochastic games

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅