首页> 外文会议>Association for the Advancement of Artificial Intelligence Symposium >Solving DEC-POMDPs by Expectation Maximization of Value Functions
【24h】

Solving DEC-POMDPs by Expectation Maximization of Value Functions

机译:通过期望的价值函数的最大化解决Dec-POMDP

获取原文

摘要

We present a new algorithm called PIEM to approximately solve for the policy of an infinite-horizon decentralized partially observable Markov decision process (DEC-POMDP). The algorithm uses expectation maximization (EM) only in the step of policy improvement, with policy evaluation achieved by solving the Bellman's equation in terms of finite state controllers (FSCs). This marks a key distinction of PIEM from the previous EM algorithm of (Kumar and Zilberstein, 2010), i.e., PIEM directly operates on a DEC-POMDP without transforming it into a mixture of dynamic Bayes nets. Thus, PIEM precisely maximizes the value function, avoiding complicated forward/backward message passing and the corresponding computational and memory cost. To overcome local optima, we follow (Pajarinen and Peltonen, 2011) to solve the DEC-POMDP for a finite length horizon and use the resulting policy graph to initialize the FSCs. We solve the finite-horizon problem using a modified point-based policy generation (PBPG) algorithm, in which a closed-form solution is provided which was previously found by linear programming in the original PBPG. Experimental results on benchmark problems show that the proposed algorithms compare favorably to state-of-the-art methods.
机译:我们展示了一种称为PIEM的新算法,大致解决了无限地平线的政策分散的部分可观察的马尔可夫决策过程(DEC-POMDP)。该算法仅在政策改进步骤中使用预期最大化(EM),通过在有限状态控制器(FSC)方面通过解决Bellman的方程来实现的策略评估。这标志着PIEM从先前的EM算法(Kumar和Zilberstein,2010)的关键区别,即PIEM直接在DEC-POMDP上运行,而不将其转化为动态贝叶斯网的混合物。因此,PIEM精确地最大化了值函数,避免了复杂的前进/后退消息通过和相应的计算和内存成本。为了克服本地最优,我们遵循(Pajarinen和Peltonen,2011)来解决Dec-POMDP为有限长度的地平线,并使用得到的策略图来初始化FSC。我们使用修改的基于点的策略生成(PBPG)算法来解决有限的地平问题,其中提供了先前通过原始PBPG中线性编程找到的闭合方案解决方案。基准问题的实验结果表明,所提出的算法对最先进的方法比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号