首页> 美国卫生研究院文献>The Journal of Neuroscience >Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network
【2h】

Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network

机译:多巴胺细胞对经典条件下的预测事件作出响应:奖励学习网络中的资格跟踪证据

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Behavioral conditioning of cue-reward pairing results in a shift of midbrain dopamine (DA) cell activity from responding to the reward to responding to the predictive cue. However, the precise time course and mechanism underlying this shift remain unclear. Here, we report a combined single-unit recording and temporal difference (TD) modeling approach to this question. The data from recordings in conscious rats showed that DA cells retain responses to predicted reward after responses to conditioned cues have developed, at least early in training. This contrasts with previous TD models that predict a gradual stepwise shift in latency with responses to rewards lost before responses develop to the conditioned cue. By exploring the TD parameter space, we demonstrate that the persistent reward responses of DA cells during conditioning are only accurately replicated by a TD model with long-lasting eligibility traces (nonzero values for the parameter λ) and low learning rate (α). These physiological constraints for TD parameters suggest that eligibility traces and low per-trial rates of plastic modification may be essential features of neural circuits for reward learning in the brain. Such properties enable rapid but stable initiation of learning when the number of stimulus-reward pairings is limited, conferring significant adaptive advantages in real-world environments.
机译:提示-奖励配对的行为调节导致中脑多巴胺(DA)细胞活性从响应奖励转变为对预测性提示的响应。但是,尚不清楚此转变的确切时间过程和机制。在这里,我们报告此问题的组合的单个单元记录和时差(TD)建模方法。来自清醒大鼠的记录数据表明,DA细胞在对条件线索的反应已经产生后,至少在训练的早期,保留了对预期奖励的反应。这与以前的TD模型形成对比,后者预测延迟的逐步变化,并在对条件提示的反应发展之前,对失去的奖励做出反应。通过探索TD参数空间,我们证明调理过程中DA细胞的持久奖励响应只能通过具有持久资格曲线(参数λ的非零值)和低学习率(α)的TD模型精确复制。这些对TD参数的生理限制表明,合格的痕迹和低的每次试验塑性修饰率可能是大脑中奖励学习的神经回路的基本特征。当刺激-奖励配对的数量受到限制时,此类属性可以快速但稳定地启动学习,从而在现实环境中具有显着的自适应优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号