首页> 外文会议>European Conference on Artificial Intelligence >Leader-Follower MDP Models with Factored State Space and Many Followers - Followers Abstraction, Structured Dynamics and State Aggregation
【24h】

Leader-Follower MDP Models with Factored State Space and Many Followers - Followers Abstraction, Structured Dynamics and State Aggregation

机译:带有因素的国家空间和许多追随者的领导者MDP模型 - 追随者抽象,结构性动态和状态聚集

获取原文

摘要

The Leader-Follower Markov Decision Processes (LF-MDP) framework extends both Markov Decision Processes (MDP) and Stochastic Games. It provides a model where an agent (the leader) can influence a set of other agents (the followers) which are playing a stochastic game, by modifying their immediate reward functions, but not their dynamics. It is assumed that all agents act selfishly and try to optimize their own long-term expected reward. Finding equilibrium strategies in a LF-MDP is hard, especially when the joint state space of followers is factored. In this case, it takes exponential time in the number of followers. Our theoretical contribution is threefold. First, we analyze a natural assumption (substitutability of followers), which holds in many applications. Under this assumption, we show that a LF-MDP can be solved exactly in polynomial time, when deterministic equilibria exist for all games encountered in the LF-MDP. Second, we show that an additional assumption of sparsity of the problem dynamics allows us to decrease the exponent of the polynomial. Finally, we present a state-aggregation approximation, which decreases further the exponent and allows us to approximately solve large problems. We empirically validate the LF-MDP approach on a class of realistic animal disease control problems. For problems of this class, we find deterministic equilibria for all games. Using our first two results, we are able to solve the exact LF-MDP problem with 15 followers (compared to 6 or 7 in the original model). Using state-aggregation, problems with up to 50 followers can be solved approximately. The approximation quality is evaluated by comparison with the exact approach on problems with 12 and 15 followers.
机译:领导者 - 追随者马尔可夫决策过程(LF-MDP)框架扩展了马尔可夫决策过程(MDP)和随机游戏。它提供了一种模型,代理人(领导者)可以影响正在播放随机游戏的一组其他代理(追随者),通过修改其直接奖励功能,而不是它们的动态。假设所有代理人都自私自己任并尝试优化自己的长期预期奖励。在LF-MDP中发现均衡策略很难,特别是当考虑追随者的联合状态空间时。在这种情况下,追随者的数量需要指数时间。我们的理论贡献是三倍。首先,我们分析了一种自然假设(追随者的可替代性),其在许多应用中占据了许多应用。在这种假设下,我们表明,当LF-MDP中的所有游戏存在确定性均衡时,可以在多项式时间内完全解决LF-MDP。其次,我们表明,问题动态的余胎的额外假设使我们能够减少多项式的指数。最后,我们提出了一个状态聚合近似,这进一步减少了指数,并允许我们大致解决大问题。我们经验验证了一类现实动物疾病控制问题的LF-MDP方法。对于本课程的问题,我们找到了所有游戏的确定性均衡。使用我们的前两种结果,我们能够用15个粉丝解决完全的LF-MDP问题(与原始模型中的6或7相比)。使用状态聚合,大约可以解决最多50个粉丝的问题。通过比较与12和15粉丝问题的确切方法进行评估,评估近似质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号