【24h】

A dynamical policy search model for matching law

机译:匹配律的动态策略搜索模型

获取原文

摘要

The matching law states that the fraction of choices made to any option will match the fraction of total rewards earned from that option. However, the income earned from conducting the matching behavior does not imply that it will get the optimal reward. It is unclear why subjects frequently exhibit the matching behavior rather than the optimal behavior. In this study, on the basis of the policy search model in reinforcement learning, an optimal algorithm is proposed, and the policy algorithm leading to matching law is derived from the optimal algorithm. Theoretical analysis and simulation results show that the decision behavior achieved by our algorithm is able to reach matching law in many kinds of reward schedules. Our results indicate that matching law can be exhibited whenever the subject tries to maximize a value function under a simple assumption that past choice behavior does not care about the values of future long-run reward. This results unveil the relationships between the matching behavior and the algorithm of optimal policy search.
机译:匹配法则指出,对任何期权做出的选择的比例将与从该期权获得的总报酬的比例相匹配。但是,通过进行匹配行为获得的收入并不意味着它将获得最佳回报。目前尚不清楚为什么受试者经常表现出匹配行为而不是最佳行为。本文在强化学习策略搜索模型的基础上,提出了一种优化算法,并从该算法中得出了导致匹配律的策略算法。理论分析和仿真结果表明,我们的算法实现的决策行为能够在多种奖励计划中达到匹配律。我们的研究结果表明,只要受试者在过去的选择行为并不关心未来长期奖励的价值的简单假设下,只要试图最大化价值功能,就可以展示出匹配法则。该结果揭示了匹配行为与最佳策略搜索算法之间的关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号