首页> 外文期刊>The Journal of Neuroscience: The Official Journal of the Society for Neuroscience >Primate Orbitofrontal Cortex Codes Information Relevant for Managing Explore-Exploit Tradeoffs
【24h】

Primate Orbitofrontal Cortex Codes Information Relevant for Managing Explore-Exploit Tradeoffs

机译:灵长类动物orbitofrontal Cortex编码信息有关管理探索 - 利用权衡相关的信息

获取原文
获取原文并翻译 | 示例
           

摘要

Reinforcement learning (RL) refers to the behavioral process of learning to obtain reward and avoid punishment. An important component of RL is managing explore-exploit tradeoffs, which refers to the problem of choosing between exploiting options with known values and exploring unfamiliar options. We examined correlates of this tradeoff, as well as other RL related variables, in orbitofrontal cortex (OFC) while three male monkeys performed a three-armed bandit learning task. During the task, novel choice options periodically replaced familiar options. The values of the novel options were unknown, and the monkeys had to explore them to see if they were better than other currently available options. The identity of the chosen stimulus and the reward outcome were strongly encoded in the responses of single OFC neurons. These two variables define the states and state transitions in our model that are relevant to decision-making. The chosen value of the option and the relative value of exploring that option were encoded at intermediate levels. We also found that OFC value coding was stimulus specific, as opposed to coding value independent of the identity of the option. The location of the option and the value of the current environment were encoded at low levels. Therefore, we found encoding of the variables relevant to learning and managing explore-exploit tradeoffs in OFC. These results are consistent with findings in the ventral striatum and amygdala and show that this monosynaptically connected network plays an important role in learning based on the immediate and future consequences of choices.
机译:强化学习(RL)是指学习获得奖励的行为过程,避免惩罚。 RL的一个重要组成部分正在管理探索 - 利用权衡,这是指利用已知值和探索不熟悉选项的利用选项之间选择的问题。我们检查了这种权衡以及其他RL相关变量的相关性,而在Orbitofrontal Cortex(OFC)中,虽然三只雄性猴子进行了三武装的匪徒学习任务。在任务期间,小说选择选项定期更换熟悉的选项。新颖的选择的价值是未知的,猴子不得不探索他们,看看它们是否比其他目前可用的选项更好。所选择的刺激和奖励结果的同一性在单一的神经元的响应中强烈编码。这两个变量定义了我们与决策相关的模型中的状态和状态转换。选项的选择值和探索该选项的相对值在中间级别编码。我们还发现,OFC值编码是特定于刺激的,而不是与选项的身份无关的编码值。选项的位置和当前环境的值以低级别编码。因此,我们发现与学习和管理探索的探索 - OFC相关的变量编码。这些结果与腹侧纹状体和Amygdala中的结果一致,并表明,基于选择的直接和未来后果,这种单跨连接的网络在学习中起着重要作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号