首页> 美国卫生研究院文献>PLoS Computational Biology >Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI
【2h】

Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI

机译:人脑中胚层回路中明显的预测误差介导了关于状态和动作的值的学习:高分辨率fMRI的证据

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Prediction-error signals consistent with formal models of “reinforcement learning” (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. Here, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models—namely, “actor/critic” models and action-value-learning models (e.g., the Q-learning model). The state-value-prediction error (SVPE), which is independent of actions, is a hallmark of the actor/critic architecture, whereas the action-value-prediction error (AVPE) is the distinguishing feature of action-value-learning algorithms. To test for the presence of these prediction-error signals in the brain, we scanned human participants with a high-resolution functional magnetic-resonance imaging (fMRI) protocol optimized to enable measurement of neural activity in the dopaminergic midbrain as well as the striatal areas to which it projects. In keeping with the actor/critic model, the SVPE signal was detected in the substantia nigra. The SVPE was also clearly present in both the ventral striatum and the dorsal striatum. However, alongside these purely state-value-based computations we also found evidence for AVPE signals throughout the striatum. These high-resolution fMRI findings suggest that model-free aspects of reward learning in humans can be explained algorithmically with RL in terms of an actor/critic mechanism operating in parallel with a system for more direct action-value learning.
机译:在中脑的多巴胺能核和纹状体的多巴胺受体区域内,反复发现与“强化学习”(RL)正式模型相符的预测误差信号。但是,在人脑中实现的RL算法的精确形式尚不确定。在这里,我们创建了一种新颖的范例,该范例经过了优化,可以分离奖励预测错误的子类型,这些奖励充当了两个不同类别的RL模型(即“行为者/批评者”模型和行动价值学习模型,例如, Q学习模型)。与动作无关的状态值预测错误(SVPE)是行为者/批评体系结构的标志,而动作值预测错误(AVPE)是动作值学习算法的显着特征。为了测试大脑中这些预测误差信号的存在,我们使用了高分辨率的功能性磁共振成像(fMRI)协议对人类参与者进行了扫描,该协议经过优化,可以测量多巴胺能中脑以及纹状体区域的神经活动它投影到的位置。与行为者/批评者模型一致,在黑质中检测到SVPE信号。 SVPE也明显存在于腹侧纹状体和背侧纹状体中。但是,除了这些纯粹基于状态值的计算之外,我们还发现整个纹状体中都有AVPE信号的证据。这些高分辨率的fMRI研究结果表明,人类的奖励学习的无模型方面可以使用RL来根据行为者/批评机制与更直接的动作值学习系统并行运行来用算法进行解释。

著录项

相似文献

  • 外文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号