Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity

Razvan V. Florian

首页> 外文期刊>Neural computation >Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity

【24h】

Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity

机译：通过调节突波时间依赖性突触可塑性来增强学习。

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The persistent modification of synaptic efficacy as a function of the relative timing of pre- and postsynaptic spikes is a phenomenon known as spike-timing-dependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive analytically learning rules involving reward-modulated spike-timing-dependent synaptic and intrinsic plasticity, by applying a reinforcement learning algorithm to the stochastic spike response model of spiking neurons. These rules have several features common to plasticity mechanisms experimentally found in the brain. We then demonstrate in simulations of networks of integrate-and-fire neurons the efficacy of two simple learning rules involving modulated STDP. One rule is a direct extension of the standard STDP model (modulated STDP), and the other one involves an eligibility trace stored at each synapse that keeps a decaying memory of the relationships between the recent pairs of pre-and postsynaptic spike pairs (modulated STDP with eligibility trace). This latter rule permits learning even if the reward signal is delayed. The proposed rules are able to solve the XOR problem with both rate-coded and temporally coded input and to learn a target output firing-rate pattern. These learning rules are biologically plausible, may be used for training generic artificial spiking neural networks, regardless of the neural model used, and suggest the experimental investigation in animals of the existence of reward-modulated STDP.

机译：突触效力持续变化作为突触前和突触后峰值相对时间的函数，是一种现象，被称为突触时机依赖性可塑性（STDP）。在这里，我们显示了通过全局奖励信号对STDP的调制会导致强化学习。我们首先通过将增强学习算法应用于尖峰神经元的随机尖峰响应模型，得出涉及奖励调制的与尖峰时间相关的突触和内在可塑性的分析学习规则。这些规则具有大脑实验性可塑性机制共有的几个特征。然后，我们在集成和发射神经元网络的仿真中证明了涉及调制STDP的两个简单学习规则的功效。一个规则是标准STDP模型（调制的STDP）的直接扩展，另一条规则涉及存储在每个突触中的资格跟踪，该跟踪保留对最近的突触前和突触后尖峰对对（调制的STDP）之间的关系的递减记忆。以及资格跟踪）。即使奖励信号被延迟，该后一个规则也允许学习。所提出的规则能够解决速率编码和时间编码输入的异或问题，并学习目标输出触发速率模式。这些学习规则在生物学上是合理的，可用于训练通用的人工加标神经网络，而与所使用的神经模型无关，并建议对动物进行奖励调节型STDP的实验研究。

著录项

来源
《Neural computation》 |2007年第6期|p.1468-1502|共35页
作者
Razvan V. Florian;
展开▼
作者单位

Center for Cognitive and Neural Studies (Coneural), 400504 Cluj-Napoca, Romania;

展开▼
收录信息美国《科学引文索引》(SCI);美国《化学文摘》(CA);
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Hardware-Friendly Actor-Critic Reinforcement Learning Through Modulation of Spike-Timing-Dependent Plasticity [J] . Nan Zheng, Pinaki Mazumder IEEE Transactions on Computers . 2017,第2期

机译：通过调节与峰值时间相关的可塑性来进行硬件友好的演员关键强化学习
2. Local learning rules: predicted influence of dendritic location on synaptic modification in spike-timing-dependent plasticity [J] . Saudargiene A, Porr B, Worgotter F Biological Cybernetics: Communication and Control in Organisms and Automata: = Nachrichtenubertragung, Nachrichtenverarbeitung, Steuerung und Regelung in Organismen und in Automaten . 2005,第2期

机译：本地学习规则：预测树突位置对突触时变依赖可塑性的突触修饰的影响
3. Spike-timing-dependent synaptic plasticity - the long road towards understanding neuronal mechanisms of learning and memory. [J] . Tsodyks M Trends in Neurosciences . 2002,第12期

机译：依赖于尖峰时间的突触可塑性-理解学习和记忆的神经元机制的漫长道路。
4. Unsupervised Learning Architecture Based on Spike-Timing-Dependent Plasticity Using Flash Memory Synaptic Devices [C] . Won-Mook Kang, Soochang Lee, Jangsaeng Kim, IEEE Silicon Nanoelectronics Workshop . 2020

机译：使用闪存突触设备的基于峰值定时可塑性的无监督学习架构
5. Fine-tuning Synaptic Plasticity by Modulation of Presynaptic Calcium(V)2.1 Channels with Calcium(II) Sensor Proteins. [D] . Leal, Karina. 2012

机译：通过调节钙（II）传感器蛋白的突触前钙（V）2.1通道微调突触可塑性。
6. Embodied Synaptic Plasticity With Online Reinforcement Learning [O] . Jacques Kaiser, Michael Hoff, Andreas Konle, 2019

机译：在线强化学习实现的突触可塑性
7. Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity [O] . Răzvan V. Florian 2007

机译：通过调制穗定时依赖性突触可塑性的加固学习

Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity

摘要

著录项

相似文献

相关主题

期刊订阅