...
首页> 外文期刊>Machine Learning >AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents
【24h】

AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents

机译:真棒:一种通用的多主体学习算法,可以自我玩法收敛,并学习对静止对手的最佳反应

获取原文
获取原文并翻译 | 示例
           

摘要

Two minimal requirements for a satisfactory multiagent learning algorithm are that it 1. learns to play optimally against stationary opponents and 2. converges to a Nash equilibrium in self-play. The previous algorithm that has come closest, WoLF-IGA, has been proven to have these two properties in 2-player 2-action (repeated) games-assuming that the opponent's mixed strategy is observable. Another algorithm, ReDVaLeR (which was introduced after the algorithm described in this paper), achieves the two properties in games with arbitrary numbers of actions and players, but still requires that the opponents' mixed strategies are observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have the two properties in games with arbitrary numbers of actions and players. It is still the only algorithm that does so while only relying on observing the other players' actual actions (not their mixed strategies). It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others' strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. We provide experimental results that suggest that AWESOME converges fast in practice. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing future multiagent learning algorithms as well.
机译:令人满意的多主体学习算法的两个最低要求是:1.学会对静止的对手进行最佳比赛,并且2.在自我比赛中收敛到纳什均衡。事实证明,最接近的先前算法WoLF-IGA在2人2动作(重复)游戏中具有这两个属性,前提是可以观察到对手的混合策略。另一种算法ReDVaLeR(在本文描述的算法之后引入)在具有任意数量的动作和玩家的游戏中实现了两个属性,但仍然要求对手的混合策略是可观察的。在本文中,我们提出了AWESOME,这是第一种保证具有任意数量的动作和玩家的游戏具有两个属性的算法。它仍然是唯一这样做的算法,只依靠观察其他参与者的实际行动(而不是他们的混合策略)。它还学习如何与最终变得静止的对手进行最佳对抗。 AWESOME的基本思想(当每个人都静止时适应,否则转移到平衡)是在别人静止不动时尝试适应他们的策略,否则退回到预先计算的均衡策略。我们提供的实验结果表明,AWESOME在实践中可以快速收敛。用于证明AWESOME的属性的技术与以前的算法根本不同,并且可能有助于分析未来的多主体学习算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号