【24h】

Code-specific policy gradient rules for spiking neurons

机译:尖峰神经元的特定于代码的策略梯度规则

获取原文

摘要

Although it is widely believed that reinforcement learning is a suitable tool for describing behavioral learning, the mechanisms by which it can be implemented in networks of spiking neurons are not fully understood. Here, we show that different learning rules emerge from a policy gradient approach depending on which features of the spike trains are assumed to influence the reward signals, i.e., depending on which neural code is in effect. We use the framework of Williams (1992) to derive learning rules for arbitrary neural codes. For illustration, we present policy-gradient rules for three different example codes - a spike count code, a spike timing code and the most general "full spike train" code - and test them on simple model problems. In addition to classical synaptic learning, we derive learning rules for intrinsic parameters that control the excitability of the neuron. The spike count learning rule has structural similarities with established Bienenstock-Cooper-Munro rules. If the distribution of the relevant spike train features belongs to the natural exponential family, the learning rules have a characteristic shape that raises interesting prediction problems.
机译:尽管人们普遍认为强化学习是描述行为学习的一种合适工具,但是在尖峰神经元网络中实现强化学习的机制仍未得到充分理解。在这里,我们显示出不同的学习规则会从策略梯度方法中产生出来,这取决于尖峰火车的哪些功能被认为会影响奖励信号,即取决于哪个神经代码有效。我们使用Williams(1992)的框架来推导任意神经代码的学习规则。为了说明,我们为三个不同的示例代码(尖峰计数代码,尖峰定时代码和最通用的“全尖峰序列”代码)提供了策略梯度规则,并在简单的模型问题上对其进行了测试。除了经典的突触学习,我们还为控制神经元兴奋性的内在参数导出了学习规则。峰值计数学习规则与已建立的Bienenstock-Cooper-Munro规则在结构上相似。如果相关峰值序列特征的分布属于自然指数族,则学习规则的特征形状会引起有趣的预测问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号