首页> 外文会议>International Integer Programming and Combinatorial Optimization Conference >A Pumping Algorithm for Ergodic Stochastic Mean Payoff Games with Perfect Information
【24h】

A Pumping Algorithm for Ergodic Stochastic Mean Payoff Games with Perfect Information

机译:具有完美信息的ergodic随机平均支付游戏的泵送算法

获取原文

摘要

In this paper, we consider two-person zero-sum stochastic mean payoff games with perfect information, or BWR-games, given by a digraph G = (V = V_B ∪ V_W ∪ V_R, E), with local rewards r: E → R, and three types of vertices: black V_B, white V_W, and random V_R. The game is played by two players, White and Black: When the play is at a white (black) vertex v, White (Black) selects an outgoing arc (v, u). When the play is at a random vertex v, a vertex u is picked with the given probability p(v, u). In all cases, Black pays White the value r(v, u). The play continues forever, and White aims to maximize (Black aims to minimize) the limiting mean (that is, average) payoff. It was recently shown in [7] that BWR-games are polynomially equivalent with the classical Gillette games, which include many well-known subclasses, such as cyclic games, simple stochastic games (SSG's), stochastic parity games, and Markov decision processes. In this paper, we give a new algorithm for solving BWR-games in the ergodic case, that is when the optimal values do not depend on the initial position. Our algorithm solves a BWR-game by reducing it, using a potential transformation, to a canonical form in which the optimal strategies of both players and the value for every initial position are obvious, since a locally optimal move in it is optimal in the whole game. We show that this algorithm is pseudo-polynomial when the number of random nodes is constant. We also provide an almost matching lower bound on its running time, and show that this bound holds for a wider class of algorithms. Let us add that the general (non-ergodic) case is at least as hard as SSG's, for which no pseudo-polynomial algorithm is known.
机译:在本文中,我们考虑与完美信息,或BWR-Games的两人Zero-Supastic均衡游戏,由Digraph G =(v = v_b∪v_w∪v_r,e)给出,本地奖励r:e→ R,以及三种类型的顶点:黑色v_b,白色v_w和随机v_r。游戏由两个玩家,白色和黑色播放:当播放处于白色(黑色)顶点V时,白色(黑色)选择输出弧(V,U)。当游戏处于随机顶点V时,用给定的概率P(v,u)挑选顶点U。在所有情况下,黑色支付白色值R(v,u)。剧本永远继续,白色的目的是最大化(黑色旨在最小化)限制平均值(即平均)的回报。它最近在[7]中,BWR-Games与经典吉列游戏相同,包括许多着名的子类,例如循环游戏,简单的随机游戏(SSG),随机奇偶校验游戏和马尔可夫决策过程。在本文中,我们提供了一种新的识别遍历案例中的BWR-Games的新算法,即最佳值不依赖于初始位置。我们的算法通过将潜在转换减少到一个规范形式的规范形式来解决BWR-游戏,其中两个玩家的最佳策略和每个初始位置的值都很明显,因为它在整体上是最佳的游戏。我们表明,当随机节点的数量是恒定的时,该算法是伪多项式。我们还提供了几乎与其运行时间的较低界限,并表明这绑定的是更广泛的算法。让我们补充一般(非ergodic)案例至少与SSG一样硬,没有已知伪多项式算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号