首页> 外文期刊>Autonomous agents and multi-agent systems >Graphical Models For Interactive Pomdps: Representations And Solutions
【24h】

Graphical Models For Interactive Pomdps: Representations And Solutions

机译:交互式Pomdps的图形模型:表示法和解决方案

获取原文
获取原文并翻译 | 示例

摘要

We develop new graphical representations for the problem of sequential decision making in partially observable multiagent environments, as formalized by interactive partially observable Markov decision processes (I-POMDPs). The graphical models called interactive influence diagrams (I-IDs) and their dynamic counterparts, interactive dynamic influence diagrams (I-DIDs), seek to explicitly model the structure that is often present in real-world problems by decomposing the situation into chance and decision variables, and the dependencies between the variables. I-DIDs generalize DIDs, which may be viewed as graphical representations of POMDPs, to multiagent settings in the same way that I-POMDPs generalize POMDPs. I-DIDs may be used to compute the policy of an agent given its belief as the agent acts and observes in a setting that is populated by other interacting agents. Using several examples, we show how I-IDs and I-DIDs may be applied and demonstrate their usefulness. We also show how the models may be solved using the standard algorithms that are applicable to DIDs. Solving I-DIDs exactly involves knowing the solutions of possible models of the other agents. The space of models grows exponentially with the number of time steps. We present a method of solving I-DIDs approximately by limiting the number of other agents' candidate models at each time step to a constant. We do this by clustering models that are likely to be behaviorally equivalent and selecting a representative set from the clusters. We discuss the error bound of the approximation technique and demonstrate its empirical performance.
机译:我们针对部分可观察的多主体环境中的顺序决策问题开发了新的图形表示,通过交互式的部分可观察的马尔可夫决策过程(I-POMDP)形式化。称为交互式影响图(I-ID)及其动态对应物的图形模型,即交互式动态影响图(I-DID),试图通过将情况分解为机会和决策来明确地模拟现实世界中经常出现的结构。变量,以及变量之间的依赖关系。 I-DID将DID(可以视为POMDP的图形表示形式)推广到多代理程序设置,其方式与I-POMDP推广POMDP的方式相同。 I-DID可以用于计算代理的策略,前提是它相信代理在代理行为并在由其他交互代理填充的设置中进行观察时所遵循的策略。通过使用几个示例,我们展示了如何应用I-ID和I-DID并展示了它们的有用性。我们还展示了如何使用适用于DID的标准算法来求解模型。解决I-DID确实涉及了解其他代理的可能模型的解决方案。模型的空间随时间步长的增长呈指数增长。我们提出了一种通过将每个时间步骤中其他代理的候选模型的数量限制为一个常数来近似解决I-DID的方法。为此,我们对可能在行为上等效的模型进行聚类,然后从聚类中选择一个代表性的集合。我们讨论了近似技术的误差范围,并证明了其经验性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号