【24h】

Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs

机译:分布式POMDP的样本有界分布式强化学习

获取原文

摘要

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a powerful modeling technique for realistic multi-agent coordination problems under uncertainty. Prevalent solution techniques are centralized and assume prior knowledge of the model. We propose a distributed reinforcement learning approach, where agents take turns to learn best responses to each other's policies. This promotes decentralization of the policy computation problem, and relaxes reliance on the full knowledge of the problem parameters. We derive the relation between the sample complexity of best response learning and error tolerance. Our key contribution is to show that sample complexity could grow exponentially with the problem horizon. We show empirically that even if the sample requirement is set lower than what theory demands, our learning approach can produce (near) optimal policies in some benchmark Dec-POMDP problems.
机译:分散的可观察的部分马尔可夫决策过程(Dec-POMDPs)为不确定性下的实际多主体协调问题提供了强大的建模技术。普遍的解决方案技术是集中化的,并假设该模型具有先验知识。我们提出了一种分布式强化学习方法,在这种方法中,特工轮流学习彼此对策的最佳反应。这促进了策略计算问题的分散,并放松了对问题参数的全面了解的依赖。我们得出最佳响应学习的样本复杂度与容错能力之间的关系。我们的主要贡献是表明样本复杂度可能随问题范围呈指数增长。我们凭经验表明,即使样本需求设置为低于理论需求,我们的学习方法也可以在某些基准Dec-POMDP问题中产生(接近)最优策略。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号