...
首页> 外文期刊>Neural Networks: The Official Journal of the International Neural Network Society >Dynamic resource allocation during reinforcement learning accounts for ramping and phasic dopamine activity
【24h】

Dynamic resource allocation during reinforcement learning accounts for ramping and phasic dopamine activity

机译:加固学习期间的动态资源分配用于斜坡和阶段的多巴胺活动

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

For an animal to learn about its environment with limited motor and cognitive resources, it should focus its resources on potentially important stimuli. However, too narrow focus is disadvantageous for adaptation to environmental changes. Midbrain dopamine neurons are excited by potentially important stimuli, such as reward-predicting or novel stimuli, and allocate resources to these stimuli by modulating how an animal approaches, exploits, explores, and attends. The current study examined the theoretical possibility that dopamine activity reflects the dynamic allocation of resources for learning. Dopamine activity may transition between two patterns: (1) phasic responses to cues and rewards, and (2) ramping activity arising as the agent approaches the reward. Phasic excitation has been explained by prediction errors generated by experimentally inserted cues. However, when and why dopamine activity transitions between the two patterns remain unknown. By parsimoniously modifying a standard temporal difference (TD) learning model to accommodate a mixed presentation of both experimental and environmental stimuli, we simulated dopamine transitions and compared them with experimental data from four different studies. The results suggested that dopamine transitions from ramping to phasic patterns as the agent focuses its resources on a small number of reward-predicting stimuli, thus leading to task dimensionality reduction. The opposite occurs when the agent re-distributes its resources to adapt to environmental changes, resulting in task dimensionality expansion. This research elucidates the role of dopamine in a broader context, providing a potential explanation for the diverse repertoire of dopamine activity that cannot be explained solely by prediction error. (c) 2020 Elsevier Ltd. All rights reserved.
机译:对于具有有限的电动机和认知资源的动物来说,为其环境学习环境,它应该将其资源集中在潜在的重要刺激上。然而,对于适应环境变化,焦点太窄是不利的。中脑多巴胺神经元由潜在的重要刺激兴奋,例如奖励预测或新颖的刺激,并通过调制动物方法,利用,探索和出席如何为这些刺激分配资源。目前的研究检测了多巴胺活动反映了学习资源动态分配的理论可能性。多巴胺活性可以在两种模式之间转换:(1)对提示和奖励的相位响应,并且(2)因代理接近奖励而产生的斜坡活动。通过通过通过实验插入的提示产生的预测误差来解释相位激发。然而,当两种模式之间的多巴胺活性转换何时以及为何仍然未知。通过减少标准的时间差异(TD)学习模型来适应实验和环境刺激的混合介绍,我们模拟了多巴胺的转变,并将其与来自四种不同研究的实验数据进行比较。结果表明,随着代理商在少数奖励预测刺激上将其资源集中在少量奖励预测刺激,从而将多巴胺转变为较倍数的奖励预测刺激。当代理重新分配其资源以适应环境变化时,发生相反,导致任务维度扩张。该研究阐明了多巴胺在更广泛的背景下的作用,为多巴胺活性的多样性曲目提供了潜在的解释,这不能仅通过预测误差来解释。 (c)2020 elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号