Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds

机译：在部分可观察到的多层设置中的加固学习：Monte Carlo探索PAC界的政策

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Perkins' Monte Carlo exploring starts for partially observable Markov decision processes (MCES-P) integrates Monte Carlo exploring starts into a local search of policy space to offer a template for reinforcement learning that operates under partial observability of the state. In this paper, we generalize the reinforcement learning under partial observability to the self-interested multiagent setting. We present a new template, MCES-IP, which extends MCES-P by maintaining predictions of the other agent's actions based on dynamic beliefs over models. MCES-IP is instantiated to be approximately locally optimal with some probability by deriving a theoretical bound on the sample size that in part depends on the allowed error from the sampling; we refer to this algorithm as MCESIP+PAC. Our experiments demonstrate that MCESIP+PAC learns policies whose values are comparable or better than those from MCESP+PAC in multiagent domains while utilizing much less samples for each transformation.

机译：Perkins的Monte Carlo探索开始为部分观察到的Markov决策过程（MCE-P）集成了Monte Carlo探索开始于当地搜索政策空间，为强化学习的模板提供在局部可观察性下运行的钢筋学习。在本文中，我们将局部可观察性的增强学习概括为自我融合性。我们展示了一个新的模板MCE-IP，通过维护基于模型的动态信念的其他代理的动态的预测来扩展MCE-P。通过导出样本大小的理论绑定，模拟MCE-IP近似局部最佳，部分取决于采样中允许的误差;我们将该算法称为MCEIP + PAC。我们的实验表明，MCEIP + PAC了解其值比来自多元域中的McESP + PAC的价值或更好的策略，同时利用每个转换的更少的样本。

著录项

来源
《International Conference on Autonomous Agents and Multiagent Systems》|2017年|766p|共9页
会议地点
作者
Roi Ceren; Prashant Doshi; Bikramjit Banerjee;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. PALO bounds for reinforcement learning in partially observable stochastic games [J] . Ceren Roi, He Keyang, Doshi Prashant, Neurocomputing . 2021,第Jana8期

机译：Palo界限为部分可观察到的随机游戏中的加固学习
2. Reinforcement learning based optimal control of batch processes using Monte-Carlo deep deterministic policy gradient with phase segmentation [J] . Haeun Yoo, Boeun Kim, Jong Woo Kim, Computers & Chemical Engineering . 2021,第Jana4期

机译：基于跨越蒙特 - 卡洛深度确定性政策梯度的批量学习基于批处理流程的最优控制
3. Policy-guided Monte Carlo: Reinforcement-learning Markov chain dynamics [J] . Troels Arnfred Bojesen Physical review, E . 2018,第6aPta2期

机译：政策引导蒙特卡罗：加固 - 学习马尔可夫链动力学
4. Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds [C] . Roi Ceren, Prashant Doshi, Bikramjit Banerjee International Conference on Autonomous Agents and Multiagent Systems . 2017

机译：在部分可观察到的多层设置中的加固学习：Monte Carlo探索PAC界的政策
5. Optimal sequential planning in partially observable multiagent settings. [D] . Doshi, Prashant J. 2005

机译：在部分可观察的多主体设置中的最佳顺序计划。
6. Towards efficient discovery of green synthetic pathways with Monte Carlo tree search and reinforcement learning [O] . Xiaoxue Wang, Yujie Qian, Hanyu Gao, 2020

机译：朝着蒙特卡罗树搜索和加固学习有效发现绿色综合途径
7. PALO bounds for reinforcement learning in partially observable stochastic games [O] . Roi Ceren, Keyang He, Prashant Doshi, 2021

机译：Palo界限为部分可观察到的随机游戏中的加固学习

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds

摘要

著录项

相似文献

相关主题

期刊订阅