...
首页> 外文期刊>PLoS Computational Biology >Spike-based Decision Learning of Nash Equilibria in Two-Player Games
【24h】

Spike-based Decision Learning of Nash Equilibria in Two-Player Games

机译:两人游戏中基于纳什均衡的基于峰值的决策学习

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Humans and animals face decision tasks in an uncertain multi-agent environment where an agent's strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.
机译:人和动物在不确定的多主体环境中面临决策任务,在该环境中,主体策略可能由于其他策略的共同适应而随时间变化。然而,这种自适应决策所依据的神经元基质和计算算法在很大程度上是未知的。我们提出了一种带有策略梯度过程的尖峰神经元种群编码模型,该模型成功地获得了经典游戏理论任务的最佳策略。建议的人口强化学习可从二十一点和检查员游戏的人类行为实验中复制数据。它分别根据纯(确定性)和混合(随机)纳什均衡最佳执行。相比之下,时差(TD)学习,协方差学习和基本强化学习无法为随机策略最佳地执行。因此,基于峰值的人口强化学习被证明遵循随机奖励梯度,因此是解释两人游戏中纳什均衡自动决策学习的可行候选人。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号