首页> 外文期刊>Econometrica >Experience-weighted attraction learning in normal form games
【24h】

Experience-weighted attraction learning in normal form games

机译:普通形式游戏中的经验加权吸引力学习

获取原文
获取原文并翻译 | 示例
           

摘要

In 'experience-weighted attraction' (EWA) learning, strategies have attractions that reflect initial predispositions, are updated based on payoff experience, and determine choice probabilities according to some rule (e.g., logit). A key feature is a parameter #delta# that weights the strength of hypothetical reinforcement of strategies that were not chosen according to the payoff they would have yielded, relative to reinforcement of chosen strategies according to received payoffs. The other key features are two discount rates, #PHI# and #rho#, which separately discount previous attractions, and an experience weight. EWA includes reinforcement learning and weighted fictitious play (belief learning) as special cases, and hybridizes their key elements. When #delta# = 0 and #rho# = 0, cumulative choice reinforcement results. When #delta# = 1 and #rho# = #PHI#, levels of reinforcement of strategies are exactly the same as expected payoffs given weighted fictitious play beliefs. Using three sets of experimental data, parameter estimates of the model were calibrated on part of the data and used to predict a holdout sample. Estimates of #delta# are generally around .50, #PHI# around .8-1, and #rho# varies from 0 to #PHI#. Reinforcement and belief-learning special cases are generally rejected in favor of EWA, though belief models do better in some constant-sum games. EWA is able to combine the best features of previous approaches, allowing attractions to begin and grow flexibly as choice reinforcement does, but reinforcing unchosen strategies substantially as belief-based models implicitly do.
机译:在``经验加权吸引力''(EWA)学习中,策略的吸引力会反映出初始倾向,会根据收益经验进行更新,并根据某些规则(例如logit)确定选择概率。一个关键特征是参数#delta#,该参数权衡了未根据将根据其产生的收益选择的策略的假设强化的强度,相对于根据已接收的收益对所选策略的强化而言。其他主要功能是两个折扣率,分别是#PHI#和#rho#,它们分别对先前的景点进行折扣和体验权重。 EWA包括强化学习和加权虚拟游戏(信仰学习)作为特例,并将其关键要素混合在一起。当#delta#= 0和#rho#= 0时,将产生累积选择强化。当#delta#= 1且#rho#=#PHI#时,策略的强化水平与给定的虚拟游戏信念给定的预期收益完全相同。使用三组实验数据,在部分数据上校准了模型的参数估计值,并用于预测保留样本。 #delta#的估计值通常约为0.50,#PHI#的估计值约为0.8-1,而#rho#的估计值从0到#PHI#不等。增强和信仰学习的特殊情况通常被EWA拒绝,尽管在某些常数和游戏中,信念模型的效果更好。 EWA能够结合以前方法的最佳功能,使吸引力像选择强化一样开始灵活地发展,但是却像基于信念的模型那样隐性地强化未选择的策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号