【24h】

Dopamine and inference about timing

机译:多巴胺和关于定时的推断

获取原文
获取外文期刊封面目录资料

摘要

Several investigators have suggested that the primate dopamine system carries an error signal for learning to predict future rewards [1, 2, 3]. These models, based on temporal-difference (TD) learning [4], explain most phasic responses of primate dopamine neurons in appetitive conditioning [5]; moreover, they suggest a neurophysiological account of animal conditioning behavior. But because existing models are based in the simple formal setting of Markov processes, they are deficient in at least two areas relevant to physiological and behavioral data. They do not provide a realistic account of the partial observability of the state of the world, nor of how the system tracks the timing of events. In this paper, we introduce a version of TD learning grounded in a richer formal model to better address both issues and, consequently, to explain some data that challenge existing models.
机译:一些调查人员表明,灵长类动物的多巴胺系统具有错误信号,用于学习预测未来奖励[1,2,3]。这些模型基于时间差(TD)学习[4],解释动物激发多巴胺神经元在开放调理中的最相平衡响应[5];此外,他们表明了动物调理行为的神经生理叙述。但是由于现有模型基于Markov进程的简单正式设置,因此它们缺乏与生理和行为数据相关的至少两个领域。他们不提供世界各国的部分可观察性的现实陈述,也没有系统如何跟踪事件的时机。在本文中,我们介绍了一个在更丰富的正式模型中接地的TD学习,以更好地解决两个问题,从而解释挑战现有模型的一些数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号