An implementation of reinforcement learning based on spike timing dependent plasticity

Patrick D. Roberts; Roberto A. Santiago; Gerardo Lafferriere

首页> 外文期刊>Biological Cybernetics >An implementation of reinforcement learning based on spike timing dependent plasticity

【24h】

An implementation of reinforcement learning based on spike timing dependent plasticity

机译：基于峰值时序相关可塑性的强化学习的实现

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

An explanatory model is developed to show how synaptic learning mechanisms modeled through spike-timing dependent plasticity (STDP) can result in long-term adaptations consistent with reinforcement learning models. In particular, the reinforcement learning model known as temporal difference (TD) learning has been used to model neuronal behavior in the orbitofrontal cortex (OFC) and ventral tegmental area (VTA) of macaque monkey during reinforcement learning. While some research has observed, empirically, a connection between STDP and TD, there has not been an explanatory model directly connecting TD to STDP. Through analysis of the learning dynamics that results from a general form of a STDP learning rule, the connection between STDP and TD is explained. We further demonstrate that a STDP learning rule drives the spike probability of a reward predicting neuronal population to a stable equilibrium. The equilibrium solution has an increasing slope where the steepness of the slope predicts the probability of the reward, similar to the results from electrophysiological recordings suggesting a different slope that predicts the value of the anticipated reward of Montague and Berns [Neuron 36(2):265–284, 2002]. This connection begins to shed light into more recent data gathered from VTA and OFC which are not well modeled by TD. We suggest that STDP provides the underlying mechanism for explaining reinforcement learning and other higher level perceptual and cognitive function.

机译：开发了一个解释性模型，以显示通过依赖尖峰时序的可塑性（STDP）建模的突触学习机制如何导致与强化学习模型一致的长期适应。特别是，在增强学习过程中，称为时间差（TD）学习的增强学习模型已被用来对猕猴的眶额皮质（OFC）和腹侧被盖区（VTA）中的神经元行为进行建模。尽管一些研究从经验上观察到了STDP和TD之间的联系，但还没有一个将TD与STDP直接联系的解释模型。通过分析由STDP学习规则的一般形式产生的学习动态，说明了STDP和TD之间的联系。我们进一步证明，STDP学习规则将预测神经元群体的奖励的峰值概率提高到稳定的平衡。平衡解的斜率增加，其中斜率的陡度可预测奖励的可能性，类似于电生理记录的结果表明，不同的斜率可预测Montague和Berns的预期奖励的价值[Neuron 36（2）： 265-284，2002]。这种联系开始使人们了解从VTA和OFC收集的最新数据，而TD并未很好地对其进行建模。我们建议，STDP提供了解释强化学习和其他更高层次的感知和认知功能的潜在机制。

著录项

来源
《Biological Cybernetics》 |2008年第6期|517-523|共7页
作者
Patrick D. Roberts; Roberto A. Santiago; Gerardo Lafferriere;
展开▼
作者单位

Department of Science and Engineering Oregon Health and Science University Portland OR 97239 USA;

Systems Science Program Portland State University Portland OR 97207 USA;

Department of Mathematics and Statistics Portland State University Portland OR 97207 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Computational neuroscience; Learning; Synaptic plasticity; Spiking neuron model;

机译：计算神经科学;学习;突触可塑性;尖峰神经元模型;

相似文献

外文文献
中文文献
专利

1. An implementation of reinforcement learning based on spike timing dependent plasticity [J] . Roberts PD, Santiago RA, Lafferriere G Biological Cybernetics: Communication and Control in Organisms and Automata: = Nachrichtenubertragung, Nachrichtenverarbeitung, Steuerung und Regelung in Organismen und in Automaten . 2008,第6期

机译：基于与峰值时间相关的可塑性的强化学习的实现
2. Spike timing dependent plasticity implements reinforcement learning [J] . Roberto A Santiago, Patrick D Roberts, Gerardo Lafferriere BMC Neuroscience . 2007,第SUPPLEMENTa2期

机译：穗时间依赖可塑性实现强化学习
3. Spike timing dependent plasticity implements reinforcement learning [J] . Roberto A Santiago, Patrick D Roberts, Gerardo Lafferriere BMC Neuroscience . 2007,第SUPPLEMENTa2期

机译：穗时间依赖可塑性实现强化学习
4. Digital implementation of a spiking neural network (SNN) capable of spike-timing-dependent plasticity (STDP) learning [C] . Di Hu, Xu Zhang, Ziye Xu, IEEE International Conference on Nanotechnology . 2014

机译：尖峰神经网络（SNN）的数字实现，能够实现依赖于尖峰时序的可塑性（STDP）学习
5. Analog Spiking Neural Network Implementing Spike Timing-Dependent Plasticity on 65 nm CMOS [D] . Vincent, Luke. 2021

机译：模拟尖峰神经网络在65nm CMOS上实施尖刺时序依赖性可塑性
6. Spike timing dependent plasticity implements reinforcement learning [O] . Roberto A Santiago, Patrick D Roberts, Gerardo Lafferriere 2007

机译：依赖于峰值时间的可塑性实现强化学习
7. Spike timing dependent plasticity implements reinforcement learning [O] . 2007

机译：依赖于峰值时间的可塑性实现强化学习

An implementation of reinforcement learning based on spike timing dependent plasticity

摘要

著录项

相似文献

相关主题

期刊订阅