Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs

机译：分布式POMDP的样本有界分布式强化学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a powerful modeling technique for realistic multi-agent coordination problems under uncertainty. Prevalent solution techniques are centralized and assume prior knowledge of the model. We propose a distributed reinforcement learning approach, where agents take turns to learn best responses to each other's policies. This promotes decentralization of the policy computation problem, and relaxes reliance on the full knowledge of the problem parameters. We derive the relation between the sample complexity of best response learning and error tolerance. Our key contribution is to show that sample complexity could grow exponentially with the problem horizon. We show empirically that even if the sample requirement is set lower than what theory demands, our learning approach can produce (near) optimal policies in some benchmark Dec-POMDP problems.

机译：分散的可观察的部分马尔可夫决策过程（Dec-POMDPs）为不确定性下的实际多主体协调问题提供了强大的建模技术。普遍的解决方案技术是集中化的，并假设该模型具有先验知识。我们提出了一种分布式强化学习方法，在这种方法中，特工轮流学习彼此对策的最佳反应。这促进了策略计算问题的分散，并放松了对问题参数的全面了解的依赖。我们得出最佳响应学习的样本复杂度与容错能力之间的关系。我们的主要贡献是表明样本复杂度可能随问题范围呈指数增长。我们凭经验表明，即使样本需求设置为低于理论需求，我们的学习方法也可以在某些基准Dec-POMDP问题中产生（接近）最优策略。

著录项

来源
《IAAI-12;Innovative applications of artificial intelligence conference;AAAI conference on artificial intelligence;Symposium on educational advances in artificial intelligence;AAAI-12;EAAI-12》|2012年|p.1256-1262|共7页
会议地点
作者
Bikramjit Banerjee; Jeremy Lyle; Landon Kraemer; Rajesh Yellamraju;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类人工智能理论;人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Bayesian-Game-Based Fuzzy Reinforcement Learning Control for Decentralized POMDPs [J] . Sharma R., Spaan M. T. J. Computational Intelligence and AI in Games, IEEE Transactions on . 2012,第4期

机译：基于贝叶斯游戏的模糊POMDP的模糊强化学习控制
2. Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs [J] . Finale Doshi-Velez, Joelle Pineau, Nicholas Roy Artificial intelligence . 2012,第期

机译：通过有限的强化进行强化学习：使用贝叶斯风险在POMDP中进行主动学习
3. Distributed Vehicle Tracking in Wireless Sensor Network: A Fully Decentralized Multiagent Reinforcement Learning Approach [J] . Teng Liang, Yan Lin, Long Shi, IEEE Sensors Letters . 2021,第1期

机译：无线传感器网络中的分布式车辆跟踪：一种完全分散的多轴加固学习方法
4. Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs [C] . Bikramjit Banerjee, Jeremy Lyle, Landon Kraemer, Innovative applications of artificial intelligence conference . 2012

机译：分散POMDPS的样本有界分布式增强学习
5. Decentralized Deep Reinforcement Learning for Network Level Traffic Signal Control [D] . Guo, Jin . 2020

机译：网络级交通信号控制分散的深度增强学习
6. Reinforcement Learning with Limited Reinforcement: Using Bayes Risk for Active Learning in POMDPs [O] . Finale Doshi, Joelle Pineau, Nicholas Roy -1

机译：通过有限的强化进行强化学习：使用Bayes风险在POMDP中进行主动学习
7. Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs [O] . Doshi-Velez Finale, Pineau Joelle, Roy Nicholas 2012

机译：通过有限的强化进行强化学习：使用贝叶斯风险在POMDP中进行主动学习
8. Distributed Reinforcement Learning for Policy Synchronization in Infinite-Horizon Dec-POMDPs. [R] . Banerjee, B., Kraemer, L. 2012

机译：无限地平线Dec-pOmDp中策略同步的分布式强化学习。

Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs

摘要

著录项

相似文献

相关主题

期刊订阅