首页> 外文会议>International Conference on Autonomous Agents and Multiagent Systems >Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds
【24h】

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds

机译:在部分可观察到的多层设置中的加固学习:Monte Carlo探索PAC界的政策

获取原文

摘要

Perkins' Monte Carlo exploring starts for partially observable Markov decision processes (MCES-P) integrates Monte Carlo exploring starts into a local search of policy space to offer a template for reinforcement learning that operates under partial observability of the state. In this paper, we generalize the reinforcement learning under partial observability to the self-interested multiagent setting. We present a new template, MCES-IP, which extends MCES-P by maintaining predictions of the other agent's actions based on dynamic beliefs over models. MCES-IP is instantiated to be approximately locally optimal with some probability by deriving a theoretical bound on the sample size that in part depends on the allowed error from the sampling; we refer to this algorithm as MCESIP+PAC. Our experiments demonstrate that MCESIP+PAC learns policies whose values are comparable or better than those from MCESP+PAC in multiagent domains while utilizing much less samples for each transformation.
机译:Perkins的Monte Carlo探索开始为部分观察到的Markov决策过程(MCE-P)集成了Monte Carlo探索开始于当地搜索政策空间,为强化学习的模板提供在局部可观察性下运行的钢筋学习。在本文中,我们将局部可观察性的增强学习概括为自我融合性。我们展示了一个新的模板MCE-IP,通过维护基于模型的动态信念的其他代理的动态的预测来扩展MCE-P。通过导出样本大小的理论绑定,模拟MCE-IP近似局部最佳,部分取决于采样中允许的误差;我们将该算法称为MCEIP + PAC。我们的实验表明,MCEIP + PAC了解其值比来自多元域中的McESP + PAC的价值或更好的策略,同时利用每个转换的更少的样本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号