首页> 美国卫生研究院文献>Proceedings of the National Academy of Sciences of the United States of America >Dopamine neurons learn to encode the long-term value of multiple future rewards
【2h】

Dopamine neurons learn to encode the long-term value of multiple future rewards

机译:多巴胺神经元学习编码多个未来奖励的长期价值

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Midbrain dopamine neurons signal reward value, their prediction error, and the salience of events. If they play a critical role in achieving specific distant goals, long-term future rewards should also be encoded as suggested in reinforcement learning theories. Here, we address this experimentally untested issue. We recorded 185 dopamine neurons in three monkeys that performed a multistep choice task in which they explored a reward target among alternatives and then exploited that knowledge to receive one or two additional rewards by choosing the same target in a set of subsequent trials. An analysis of anticipatory licking for reward water indicated that the monkeys did not anticipate an immediately expected reward in individual trials; rather, they anticipated the sum of immediate and multiple future rewards. In accordance with this behavioral observation, the dopamine responses to the start cues and reinforcer beeps reflected the expected values of the multiple future rewards and their errors, respectively. More specifically, when monkeys learned the multistep choice task over the course of several weeks, the responses of dopamine neurons encoded the sum of the immediate and expected multiple future rewards. The dopamine responses were quantitatively predicted by theoretical descriptions of the value function with time discounting in reinforcement learning. These findings demonstrate that dopamine neurons learn to encode the long-term value of multiple future rewards with distant rewards discounted.
机译:中脑多巴胺神经元发出奖励价值,其预测误差和事件显着性信号。如果他们在实现特定的遥远目标中发挥关键作用,则应按照强化学习理论中的建议对未来的长期回报进行编码。在这里,我们解决了这个未经实验的问题。我们在执行多步选择任务的三只猴子中记录了185个多巴胺神经元,在这些猴子中,他们探索了替代方案中的奖励目标,然后通过在一系列后续试验中选择相同的目标,利用该知识获得一个或两个其他奖励。对预期舔食奖励水的分析表明,在个别试验中,猴子并没有期望立即获得预期的奖励。相反,他们期望立即获得和将来获得的回报之和。根据此行为观察,多巴胺对起始提示和增强蜂鸣声的反应分别反映了多个未来奖励的期望值及其错误。更具体地说,当猴子在几周的过程中学会了多步选择任务时,多巴胺神经元的反应会编码立即和预期的多个未来奖励的总和。多巴胺反应是通过强化学习中时间折扣的价值函数的理论描述来定量预测的。这些发现表明,多巴胺神经元学会了编码远期奖励折扣后的多个未来奖励的长期价值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号