首页> 美国卫生研究院文献>The Journal of Neuroscience >Ventral Striatum and Orbitofrontal Cortex Are Both Required for Model-Based But Not Model-Free Reinforcement Learning
【2h】

Ventral Striatum and Orbitofrontal Cortex Are Both Required for Model-Based But Not Model-Free Reinforcement Learning

机译:基于模型而不是无模型的强化学习都需要腹侧纹状体和眶额皮质

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In many cases, learning is thought to be driven by differences between the value of rewards we expect and rewards we actually receive. Yet learning can also occur when the identity of the reward we receive is not as expected, even if its value remains unchanged. Learning from changes in reward identity implies access to an internal model of the environment, from which information about the identity of the expected reward can be derived. As a result, such learning is not easily accounted for by model-free reinforcement learning theories such as temporal difference reinforcement learning (TDRL), which predicate learning on changes in reward value, but not identity. Here, we used unblocking procedures to assess learning driven by value- versus identity-based prediction errors. Rats were trained to associate distinct visual cues with different food quantities and identities. These cues were subsequently presented in compound with novel auditory cues and the reward quantity or identity was selectively changed. Unblocking was assessed by presenting the auditory cues alone in a probe test. Consistent with neural implementations of TDRL models, we found that the ventral striatum was necessary for learning in response to changes in reward value. However, this area, along with orbitofrontal cortex, was also required for learning driven by changes in reward identity. This observation requires that existing models of TDRL in the ventral striatum be modified to include information about the specific features of expected outcomes derived from model-based representations, and that the role of orbitofrontal cortex in these models be clearly delineated.
机译:在许多情况下,人们认为学习是由我们期望的奖励价值与我们实际获得的奖励之间的差异驱动的。然而,当我们获得的奖励的身份与预期不符时,即使其价值保持不变,也会发生学习。从奖励身份的变化中学习意味着可以访问环境的内部模型,从中可以得出有关预期奖励的身份的信息。结果,这种学习不容易被无模型的强化学习理论(如时差强化学习(TDRL))所解释,该理论基于学习奖励价值的变化而不是身份。在这里,我们使用了无障碍程序来评估由基于价值和基于身份的预测错误驱动的学习。对大鼠进行了训练,使不同的视觉提示与不同的食物数量和身份相关联。这些提示随后与新颖的听觉提示一起呈现,并且奖励数量或身份被有选择地改变。通过在探查测试中单独呈现听觉提示来评估疏通程度。与TDRL模型的神经实现相一致,我们发现腹侧纹状体对于响应奖励价值变化的学习是必要的。但是,该区域以及眶额皮质也是奖励身份变化驱动的学习所必需的。该观察要求对腹侧纹状体中的TDRL现有模型进行修改,以包括有关从基于模型的表示中得出的预期结果的特定特征的信息,并明确描述眶额皮质在这些模型中的作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号