首页> 美国卫生研究院文献>Frontiers in Behavioral Neuroscience >Dopamine-Mediated Learning and Switching in Cortico-Striatal Circuit Explain Behavioral Changes in Reinforcement Learning
【2h】

Dopamine-Mediated Learning and Switching in Cortico-Striatal Circuit Explain Behavioral Changes in Reinforcement Learning

机译:多巴胺介导的学习和皮质-纹状体电路的转换解释了强化学习中的行为变化

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The basal ganglia are thought to play a crucial role in reinforcement learning. Central to the learning mechanism are dopamine (DA) D1 and D2 receptors located in the cortico-striatal synapses. However, it is still unclear how this DA-mediated synaptic plasticity is deployed and coordinated during reward-contingent behavioral changes. Here we propose a computational model of reinforcement learning that uses different thresholds of D1- and D2-mediated synaptic plasticity which are antagonized by DA-independent synaptic plasticity. A phasic increase in DA release caused by a larger-than-expected reward induces long-term potentiation (LTP) in the direct pathway, whereas a phasic decrease in DA release caused by a smaller-than-expected reward induces a cessation of long-term depression, leading to LTP in the indirect pathway. This learning mechanism can explain the robust behavioral adaptation observed in a location-reward-value-association task where the animal makes shorter latency saccades to reward locations. The changes in saccade latency become quicker as the monkey becomes more experienced. This behavior can be explained by a switching mechanism which activates the cortico-striatal circuit selectively. Our model also shows how D1- or D2-receptor blocking experiments affect selectively either reward or no-reward trials. The proposed mechanisms also explain the behavioral changes in Parkinson's disease.
机译:基底神经节被认为在强化学习中起着至关重要的作用。学习机制的中心是位于皮层纹状体突触中的多巴胺(DA)D1和D2受体。然而,目前尚不清楚在奖赏偶发的行为改变过程中如何部署和协调这种DA介导的突触可塑性。在这里,我们提出了一种强化学习的计算模型,该模型使用D1和D2介导的突触可塑性的不同阈值,而这些阈值被DA无关的突触可塑性所拮抗。由大于预期的奖励导致的DA释放的阶段性增加会在直接途径中诱导长期增强(LTP),而由小于预期的奖励导致的DA释放的阶段性减少会导致长期的停止。长期抑郁症,导致间接途径中的LTP。这种学习机制可以解释在位置奖励值关联任务中观察到的强大的行为适应性,其中动物做出较短的潜伏期扫视来奖励位置。随着猴子变得更有经验,扫视潜伏期的变化变得更快。这种行为可以通过选择性地激活皮质纹状体回路的开关机制来解释。我们的模型还显示了D1或D2受体阻滞实验如何选择性地影响奖励或无奖励试验。提出的机制还解释了帕金森氏病的行为变化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号