Dopamine neurons learn to encode the long-term value of multiple future rewards

机译：多巴胺神经元学习编码多个未来奖励的长期价值

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Midbrain dopamine neurons signal reward value, their prediction error, and the salience of events. If they play a critical role in achieving specific distant goals, long-term future rewards should also be encoded as suggested in reinforcement learning theories. Here, we address this experimentally untested issue. We recorded 185 dopamine neurons in three monkeys that performed a multistep choice task in which they explored a reward target among alternatives and then exploited that knowledge to receive one or two additional rewards by choosing the same target in a set of subsequent trials. An analysis of anticipatory licking for reward water indicated that the monkeys did not anticipate an immediately expected reward in individual trials; rather, they anticipated the sum of immediate and multiple future rewards. In accordance with this behavioral observation, the dopamine responses to the start cues and reinforcer beeps reflected the expected values of the multiple future rewards and their errors, respectively. More specifically, when monkeys learned the multistep choice task over the course of several weeks, the responses of dopamine neurons encoded the sum of the immediate and expected multiple future rewards. The dopamine responses were quantitatively predicted by theoretical descriptions of the value function with time discounting in reinforcement learning. These findings demonstrate that dopamine neurons learn to encode the long-term value of multiple future rewards with distant rewards discounted.

机译：中脑多巴胺神经元发出奖励价值，其预测误差和事件显着性信号。如果他们在实现特定的遥远目标中发挥关键作用，则应按照强化学习理论中的建议对未来的长期回报进行编码。在这里，我们解决了这个未经实验的问题。我们在执行多步选择任务的三只猴子中记录了185个多巴胺神经元，在这些猴子中，他们探索了替代方案中的奖励目标，然后通过在一系列后续试验中选择相同的目标，利用该知识获得一个或两个其他奖励。对预期舔食奖励水的分析表明，在个别试验中，猴子并没有期望立即获得预期的奖励。相反，他们期望立即获得和将来获得的回报之和。根据此行为观察，多巴胺对起始提示和增强蜂鸣声的反应分别反映了多个未来奖励的期望值及其错误。更具体地说，当猴子在几周的过程中学会了多步选择任务时，多巴胺神经元的反应会编码立即和预期的多个未来奖励的总和。多巴胺反应是通过强化学习中时间折扣的价值函数的理论描述来定量预测的。这些发现表明，多巴胺神经元学会了编码远期奖励折扣后的多个未来奖励的长期价值。

著录项

期刊名称 Proceedings of the National Academy of Sciences of the United States of America
作者
Kazuki Enomoto; Naoyuki Matsumoto; Sadamu Nakai; Takemasa Satoh; Tatsuo K. Sato; Yasumasa Ueda; Hitoshi Inokawa; Masahiko Haruno; Minoru Kimura;
展开▼
作者单位

展开▼
年(卷),期 2011(108),37
年度 2011
页码 15462–15467
总页数 6
原文格式 PDF
正文语种
中图分类
关键词
decision making basal ganglia temporal difference learning primate;

机译：决策;基底神经节;时差学习;灵长类;

相似文献

外文文献
中文文献
专利

1. Dopamine neurons learn to encode the long-term value of multiple future rewards [J] . Kazuki Enomoto, Naoyuki Matsumoto, Sadamu Nakai, Proceedings of the National Academy of Sciences of the United States of America . 2011,第37期

机译：多巴胺神经元学习编码多个未来奖励的长期价值
2. Dopamine neurons learn relative chosen value from probabilistic rewards [J] . Armin Lak, William R Stauffer, Wolfram Schultz eLife journal . 2016,第november期

机译：多巴胺神经元从概率奖励中学习相对选择的价值
3. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target [J] . Parker Nathan F., Cameron Courtney M., Taliaferro Joshua P., Nature neuroscience . 2016,第6期

机译：中脑多巴胺神经元末端的奖励和选择编码取决于纹状体靶标
4. The dopamine D_3 system: new opportunities for dopamine-based reward [C] . Christian Heidbrede Symposium on Understanding Nicotine and Tobacco Addiction . 2006

机译：多巴胺D_3系统：基于多巴胺的奖励的新机会
5. Burst timing-dependent plasticity of NMDA receptor-mediated transmission in midbrain dopamine neurons: A putative cellular substrate for reward learning. [D] . Harnett, Mark Thomas. 2009

机译：中脑多巴胺神经元中NMDA受体介导的传递的突发时间依赖性可塑性：用于奖励学习的推定细胞底物。
6. Dopamine neurons learn relative chosen value from probabilistic rewards [O] . Armin Lak, William R Stauffer, Wolfram Schultz 2016

机译：多巴胺神经元从概率奖励中学习相对选择的价值
7. Dopamine neurons learn to encode the long-term value of multiple future rewards [O] . Enomoto, Kazuki, Matsumoto, Naoyuki, Nakai, Sadamu, 2011

机译：多巴胺神经元学会编码多个未来奖励的长期价值

Dopamine neurons learn to encode the long-term value of multiple future rewards

摘要

著录项

相似文献

相关主题

期刊订阅