...
首页> 外文期刊>Neural computation >Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity
【24h】

Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity

机译:通过调节突波时间依赖性突触可塑性来增强学习。

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The persistent modification of synaptic efficacy as a function of the relative timing of pre- and postsynaptic spikes is a phenomenon known as spike-timing-dependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive analytically learning rules involving reward-modulated spike-timing-dependent synaptic and intrinsic plasticity, by applying a reinforcement learning algorithm to the stochastic spike response model of spiking neurons. These rules have several features common to plasticity mechanisms experimentally found in the brain. We then demonstrate in simulations of networks of integrate-and-fire neurons the efficacy of two simple learning rules involving modulated STDP. One rule is a direct extension of the standard STDP model (modulated STDP), and the other one involves an eligibility trace stored at each synapse that keeps a decaying memory of the relationships between the recent pairs of pre-and postsynaptic spike pairs (modulated STDP with eligibility trace). This latter rule permits learning even if the reward signal is delayed. The proposed rules are able to solve the XOR problem with both rate-coded and temporally coded input and to learn a target output firing-rate pattern. These learning rules are biologically plausible, may be used for training generic artificial spiking neural networks, regardless of the neural model used, and suggest the experimental investigation in animals of the existence of reward-modulated STDP.
机译:突触效力持续变化作为突触前和突触后峰值相对时间的函数,是一种现象,被称为突触时机依赖性可塑性(STDP)。在这里,我们显示了通过全局奖励信号对STDP的调制会导致强化学习。我们首先通过将增强学习算法应用于尖峰神经元的随机尖峰响应模型,得出涉及奖励调制的与尖峰时间相关的突触和内在可塑性的分析学习规则。这些规则具有大脑实验性可塑性机制共有的几个特征。然后,我们在集成和发射神经元网络的仿真中证明了涉及调制STDP的两个简单学习规则的功效。一个规则是标准STDP模型(调制的STDP)的直接扩展,另一条规则涉及存储在每个突触中的资格跟踪,该跟踪保留对最近的突触前和突触后尖峰对对(调制的STDP)之间的关系的递减记忆。以及资格跟踪)。即使奖励信号被延迟,该后一个规则也允许学习。所提出的规则能够解决速率编码和时间编码输入的异或问题,并学习目标输出触发速率模式。这些学习规则在生物学上是合理的,可用于训练通用的人工加标神经网络,而与所使用的神经模型无关,并建议对动物进行奖励调节型STDP的实验研究。

著录项

  • 来源
    《Neural computation》 |2007年第6期|p.1468-1502|共35页
  • 作者

    Razvan V. Florian;

  • 作者单位

    Center for Cognitive and Neural Studies (Coneural), 400504 Cluj-Napoca, Romania;

  • 收录信息 美国《科学引文索引》(SCI);美国《化学文摘》(CA);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 人工智能理论;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号