Stability of learning dynamics in two-agent, imperfect-information games

机译：两主体，不完全信息游戏中学习动态的稳定性

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

One issue in multi-agent co-adaptive learning concerns convergence. When two (or more) agents play a game with different information and different payoffs, the general behaviour tends to be oscillation around a Nash equilibrium. Several algorithms have been proposed to force convergence to mixed-strategy Nash equilibria in imperfect-information games when the agents are aware of their opponent's strategy. We consider the effect on one such algorithm, the lagging anchor algorithm, when each agent must also infer the gradient information from observations, in the infinitesimal time-step limit. Use of an estimated gradient, either by opponent modelling or stochastic gradient ascent, destabilises the algorithm in a region of parameter space. There are two phases of behaviour. If the rate of estimation is low, the Nash equilibrium becomes unstable in the mean. If the rate is high, the Nash equilibrium is an attractive fixed point in the mean, but the uncertainty acts as narrow-band coloured noise, which causes dampened oscillations.

机译：多主体协作学习中的一个问题是收敛。当两个（或多个）代理人在玩游戏时具有不同的信息和不同的收益时，一般行为倾向于围绕纳什均衡波动。当代理商知道他们的对手的策略时，已经提出了几种算法来迫使不完全信息游戏中的混合策略纳什均衡收敛。当每个智能体还必须在无限的时间步长限制内还必须从观测值推断出梯度信息时，我们考虑对这种算法（滞后锚定算法）的影响。通过对手建模或随机梯度上升使用估计的梯度会使算法在参数空间区域中不稳定。行为分为两个阶段。如果估计率低，则纳什均衡的平均值将变得不稳定。如果比率很高，则纳什均衡在平均值上是一个吸引人的固定点，但不确定性会充当窄带彩色噪声，从而导致振荡衰减。

著录项

来源
《Proceedings of the Tenth ACM SIGEVO workshop on Foundations of genetic algorithms》|2009年|P.131 - 140|共10页
会议地点 Orlando FL(US)
作者
John M. Butterworth; Jonathan L. Shapiro;
展开▼
作者单位

University of Manchester;

University of Manchester;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类遗传学;
关键词
co-adapting agents; game theory; reinforcement learning;

机译：协同适应主体;博弈论;强化学习;;

相似文献

外文文献
中文文献
专利

1. A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games [J] . Li ZHANG, Yuxuan CHEN, Wei WANG, Frontiers of computer science . 2021,第5期

机译：一个蒙特卡罗神经虚拟自助式自助方法，以近期信息动态游戏近似纳什均衡
2. Applying Modelica Tools to System Dynamics Based Learning Games: Project Management Game [J] . Tuomas Miettinen, Juho Salmi, Kunal Gupta, Modelling and simulation in engineering . 2016,第期

机译：将Modelica工具应用于基于系统动力学的学习游戏：项目管理游戏
3. Dynamical system learning using extreme learning machines with safety and stability guarantees [J] . Salehi Iman, Rotithor Ghananeel, Yao Gang, International Journal of Adaptive Control and Signal Processing . 2021,第6期

机译：使用极端学习机具有安全性和稳定性的动态系统学习
4. Stability of learning dynamics in two-agent, imperfect-information games [C] . John M. Butterworth, Jonathan L. Shapiro ACM SIGEVO workshop on Foundations of genetic algorithms . 2009

机译：双代理，不完美信息游戏中学习动态的稳定性
5. Learning Dynamics and Reinforcement in Stochastic Games [D] . Holler, John Edward. 2020

机译：随机游戏中的学习动态和加固
6. Landscape and flux for quantifying global stability and dynamics of game theory [O] . Li Xu, Jin Wang 2012

机译：景观和通量用于量化整体博弈理论的稳定性和动力学
7. Stability of learning dynamics in two-agent, imperfect-information games [O] . Butterworth, John M., Shapiro, Jonathan L. 2009

机译：两主体，不完全信息游戏中学习动态的稳定性

Stability of learning dynamics in two-agent, imperfect-information games

摘要

著录项

相似文献

相关主题

期刊订阅