Error-Bounded Approximations for Infinite-Horizon Discounted Decentralized POMDPs

机译：无限地平线折扣分散POMDP的误差有界近似

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We address decentralized stochastic control problems represented as decentralized partially observable Markov decision processes (Dec-POMDPs). This formalism provides a general model for decision-making under uncertainty in cooperative, decentralized settings, but the worst-case complexity makes it difficult to solve optimally (NEXP-complete). Recent advances suggest recasting Dec-POMDPs into continuous-state and deterministic MDPs. In this form, however, states and actions are embedded into high-dimensional spaces, making accurate estimate of states and greedy selection of actions intractable for all but trivial-sized problems. The primary contribution of this paper is the first framework for error-monitoring during approximate estimation of states and selection of actions. Such a framework permits us to convert state-of-the-art exact methods into error-bounded algorithms, which results in a scalability increase as demonstrated by experiments over problems of unprecedented sizes.

机译：我们解决了分散的，局部可观察的马尔可夫决策过程（Dec-POMDPs）所代表的分散随机控制问题。这种形式主义为在合作，分散的环境中不确定性下的决策提供了一个通用模型，但是最坏情况下的复杂性使得难以最优地求解（NEXP-complete）。最近的进展表明，将Dec-POMDP重新铸造为连续状态和确定性MDP。然而，以这种形式，状态和动作被嵌入到高维空间中，从而使得状态的精确估计和对动作的贪婪选择对于除小规模问题之外的所有问题都是难以解决的。本文的主要贡献是在状态的近似估计和动作选择期间进行错误监视的第一个框架。这样的框架使我们能够将最新的精确方法转换为错误错误的算法，这导致可扩展性的提高，这是针对规模空前的问题进行的实验所证明的。

著录项

来源
《European conference on machine learning and knowledge discovery in databases》|2014年|338-353|共16页
会议地点
作者
Jilles S. Dibangoye; Olivier Buffet; Francois Charpillet;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
decentralized stochastic control; error-bounded approximations;

机译：分散的随机控制;误差界近似;

相似文献

外文文献
中文文献
专利

1. Neural approximations in discounted infinite-horizon stochastic optimal control problems [J] . Giorgio Gnecco, Marcello Sanguineti Engineering Applications of Artificial Intelligence . 2018,第SEPa期

机译：无限水平对折随机最优控制问题的神经网络近似
2. Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs [J] . Christopher Amato, Daniel S. Bernstein, Shlomo Zilberstein Autonomous agents and multi-agent systems . 2010,第3期

机译：针对POMDP和分散式POMDP优化固定大小的随机控制器
3. Neural approximations for infinite-horizon optimal control of nonlinear stochastic systems [J] . Parisini T., Zoppoli R. IEEE Transactions on Neural Networks . 1998,第6期

机译：非线性随机系统无限水平最优控制的神经近似
4. Error-Bounded Approximations for Infinite-Horizon Discounted Decentralized POMDPs [C] . Jilles S. Dibangoye, Olivier Buffet, Fran?ois Charpillet European Conference on Machine Learning and Knowledge Discovery in Databases . 2014

机译：无限地平线的误报近似折扣分散的POMDPS
5. Estimating individual level discount factors and testing competing discounting hypotheses [D] . Meyer, Andrew Gerald 2009

机译：估计个人水平的折现因子并测试竞争性折现假设
6. Modeling and Planning with Macro-Actions in Decentralized POMDPs [O] . Christopher Amato, George Konidaris, Leslie P. Kaelbling, -1

机译：在分散的POMDP中使用宏动作进行建模和计划
7. Error-Bounded Approximations for Infinite-Horizon Discounted Decentralized POMDPs [O] . Dibangoye, Jilles Steeve, Buffet, Olivier, Charpillet, François 2014

机译：无限期折扣分散式POMDP的误差有界近似
8. Distributed Reinforcement Learning for Policy Synchronization in Infinite-Horizon Dec-POMDPs. [R] . Banerjee, B., Kraemer, L. 2012

机译：无限地平线Dec-pOmDp中策略同步的分布式强化学习。

Error-Bounded Approximations for Infinite-Horizon Discounted Decentralized POMDPs

摘要

著录项

相似文献

相关主题

期刊订阅