首页> 美国卫生研究院文献>PLoS Computational Biology >A Learning Theory for Reward-Modulated Spike-Timing-DependentPlasticity with Application to Biofeedback
【2h】

A Learning Theory for Reward-Modulated Spike-Timing-DependentPlasticity with Application to Biofeedback

机译:奖励调制的穗定时依赖型学习理论可塑性及其在生物反馈中的应用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Reward-modulated spike-timing-dependent plasticity (STDP) has recently emerged as a candidate for a learning rule that could explain how behaviorally relevant adaptive changes in complex networks of spiking neurons could be achieved in a self-organizing manner through local synaptic plasticity. However, the capabilities and limitations of this learning rule could so far only be tested through computer simulations. This article provides tools for an analytic treatment of reward-modulated STDP, which allows us to predict under which conditions reward-modulated STDP will achieve a desired learning effect. These analytical results imply that neurons can learn through reward-modulated STDP to classify not only spatial but also temporal firing patterns of presynaptic neurons. They also can learn to respond to specific presynaptic firing patterns with particular spike patterns. Finally, the resulting learning theory predicts that even difficult credit-assignment problems, where it is very hard to tell which synaptic weights should be modified in order to increase the global reward for the system, can be solved in a self-organizing manner through reward-modulated STDP. This yields an explanation for a fundamental experimental result on biofeedback in monkeys by Fetz and Baker. In this experiment monkeyswere rewarded for increasing the firing rate of a particular neuron in thecortex and were able to solve this extremely difficult credit assignmentproblem. Our model for this experiment relies on a combination ofreward-modulated STDP with variable spontaneous firing activity. Hence it alsoprovides a possible functional explanation for trial-to-trial variability, whichis characteristic for cortical networks of neurons but has no analogue incurrently existing artificial computing systems. In addition our modeldemonstrates that reward-modulated STDP can be applied to all synapses in alarge recurrent neural network without endangering the stability of the networkdynamics.
机译:奖励调制的依赖于时序定时的可塑性(STDP)最近成为一种学习规则的候选者,该规则可以解释如何通过局部突触可塑性以自组织的方式实现复杂的尖峰神经元网络中行为相关的适应性变化。但是,到目前为止,该学习规则的功能和局限性只能通过计算机模拟进行测试。本文提供了对奖励调制的STDP进行分析处理的工具,该工具使我们能够预测奖励调制的STDP在何种条件下将获得理想的学习效果。这些分析结果表明,神经元可以通过奖励调制的STDP进行学习,从而不仅对突触前神经元的空间激发模式而且对时间激发模式进行分类。他们还可以学习以特定的尖峰模式来响应特定的突触前发射模式。最后,由此产生的学习理论预测,即使很难解决信用分配问题,也很难通过奖励以自组织的方式解决该难题,因为很难确定应更改哪些突触权重以增加系统的整体奖励。 -调制的STDP。这就解释了Fetz和Baker对猴子生物反馈的基本实验结果。在这个实验中,猴子因增加特定神经元的放电速度而获得奖励皮质,并能够解决这一极其困难的学分分配问题。本实验的模型依赖于自发激发活动可变的奖励调制STDP。因此它也提供了试验间差异的可能功能解释,是神经元皮质网络的特征,但在神经元中没有类似物当前现有的人工计算系统。另外我们的模型证明了奖励调制的STDP可以应用于神经元中的所有突触大型递归神经网络而不会危害网络的稳定性动力学。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号