首页> 外文会议>Proceedings of the Tenth ACM SIGEVO workshop on Foundations of genetic algorithms >Stability of learning dynamics in two-agent, imperfect-information games
【24h】

Stability of learning dynamics in two-agent, imperfect-information games

机译:两主体,不完全信息游戏中学习动态的稳定性

获取原文
获取原文并翻译 | 示例

摘要

One issue in multi-agent co-adaptive learning concerns convergence. When two (or more) agents play a game with different information and different payoffs, the general behaviour tends to be oscillation around a Nash equilibrium. Several algorithms have been proposed to force convergence to mixed-strategy Nash equilibria in imperfect-information games when the agents are aware of their opponent's strategy. We consider the effect on one such algorithm, the lagging anchor algorithm, when each agent must also infer the gradient information from observations, in the infinitesimal time-step limit. Use of an estimated gradient, either by opponent modelling or stochastic gradient ascent, destabilises the algorithm in a region of parameter space. There are two phases of behaviour. If the rate of estimation is low, the Nash equilibrium becomes unstable in the mean. If the rate is high, the Nash equilibrium is an attractive fixed point in the mean, but the uncertainty acts as narrow-band coloured noise, which causes dampened oscillations.
机译:多主体协作学习中的一个问题是收敛。当两个(或多个)代理人在玩游戏时具有不同的信息和不同的收益时,一般行为倾向于围绕纳什均衡波动。当代理商知道他们的对手的策略时,已经提出了几种算法来迫使不完全信息游戏中的混合策略纳什均衡收敛。当每个智能体还必须在无限的时间步长限制内还必须从观测值推断出梯度信息时,我们考虑对这种算法(滞后锚定算法)的影响。通过对手建模或随机梯度上升使用估计的梯度会使算法在参数空间区域中不稳定。行为分为两个阶段。如果估计率低,则纳什均衡的平均值将变得不稳定。如果比率很高,则纳什均衡在平均值上是一个吸引人的固定点,但不确定性会充当窄带彩色噪声,从而导致振荡衰减。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号