首页> 外文期刊>IBRO Reports >Neural circuits for visual stimulus competition in zebrafish
【24h】

Neural circuits for visual stimulus competition in zebrafish

机译:斑马鱼视觉刺激竞争的神经回路

获取原文
获取外文期刊封面目录资料

摘要

The need for reinforcement signals other than dopamine Christopher D. Fiorillo KAIST, Daejeon, Republic of Korea The brain must learn to approach reward and avoid punish- ment. Computational and physiological models explain how this could occur through reinforcement signals that indicate good or bad and cause Hebbian reinforcement learning (RL). These mod- els are supported by overwhelming experimental evidence that midbrain dopamine neurons signal a ‘reward prediction error’ (RPE) that causes the positive reinforcement of stimuli and actions. There is far less experimental evidence concerning the relation of the dopamine RPE to learning aversiveness, and absence of reward. In standard RL models, the dopamine RPE is sufficient for all reinforcement. These models assume that one dopamine RPE signals ‘total value,’ in which reward and aversiveness are merely opposites on a single dimension of value (analogous to light and dark on the single dimension of light intensity). In con- trast, I will present the theoretical rationale for why reward and aversiveness should be two distinct dimensions of value, and why the learning of reward and aversive value calls for four discrete types of reinforcement signals, representing evidence for reward, against reward, for aversiveness, and against aversiveness. Care- fully controlled experiments indicate that the well characterized dopamine RPE signals only evidence for reward, and therefore plays amore restricted role in RL than commonly believed. Greater effort is needed to characterize the other three types of reinforcement signals.
机译:多巴胺以外的其他增强信号的需要Christopher D. Fiorillo KAIST,大田,韩国大脑必须学会接近奖励并避免惩罚。计算和生理模型解释了这种情况如何通过指示好坏的增强信号发生,并引起Hebbian增强学习(RL)。这些模型得到了压倒性的实验证据的支持,这些证据表明中脑多巴胺神经元发出“奖励预测错误”(RPE),从而引起刺激和行为的积极增强。关于多巴胺RPE与学习厌恶和缺乏奖赏的关系的实验证据要少得多。在标准RL模型中,多巴胺RPE足以进行所有加固。这些模型假设一个多巴胺RPE发出“总价值”信号,其中奖励和厌恶行为在价值的一个维度上是相反的(在光强度的一个维度上类似于明暗)。与此相反,我将提出理论基础,说明奖赏和厌恶为什么应该是价值的两个截然不同的维度,以及为什么奖赏和厌恶价值的学习需要四种离散的强化信号,分别代表奖赏,反对奖赏,反对和反对。严格控制的实验表明,特征充分的多巴胺RPE仅表示奖励的证据,因此在RL中的作用比通常认为的受更多限制。需要更多的努力来表征其他三种类型的增强信号。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号