首页> 外文会议>International Workshop on Signal Processing Advances in Wireless Communications >Learning First-to-Spike Policies for Neuromorphic Control Using Policy Gradients
【24h】

Learning First-to-Spike Policies for Neuromorphic Control Using Policy Gradients

机译:使用策略梯度学习神经形态控制的先到先得策略

获取原文

摘要

Artificial Neural Networks (ANNs) are currently being used as function approximators in many state-of-the-art Reinforcement Learning (RL) algorithms. Spiking Neural Networks (SNNs) have been shown to drastically reduce the energy consumption of ANNs by encoding information in sparse temporal binary spike streams, hence emulating the communication mechanism of biological neurons. Due to their low energy consumption, SNNs are considered to be important candidates as co-processors to be implemented in mobile devices. In this work, the use of SNNs as stochastic policies is explored under an energy-efficient first-to-spike action rule, whereby the action taken by the RL agent is determined by the occurrence of the first spike among the output neurons. A policy gradient-based algorithm is derived considering a Generalized Linear Model (GLM) for spiking neurons. Experimental results demonstrate the capability of online trained SNNs as stochastic policies to gracefully trade energy consumption, as measured by the number of spikes, and control performance. Significant gains are shown as compared to the standard approach of converting an offline trained ANN into an SNN.
机译:人工神经网络(ANN)当前在许多最新的强化学习(RL)算法中用作函数逼近器。已显示,尖峰神经网络(SNN)通过在稀疏的时间二进制尖峰流中编码信息,从而极大地降低了ANN的能耗,从而模拟了生物神经元的通信机制。由于其低能耗,SNN被认为是要在移动设备中实现的协处理器的重要候选者。在这项工作中,在高能效的第一峰值动作规则下探索了将SNN用作随机策略,从而由RL代理采取的动作由输出神经元中第一峰值的发生来确定。推导了基于策略梯度的算法,其中考虑了用于加标神经元的广义线性模型(GLM)。实验结果表明,在线训练的SNN具有作为随机策略的能力,可以根据尖峰次数和控制性能来合理地权衡能耗。与将脱机训练的ANN转换为SNN的标准方法相比,显示了显着的收益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号