Learning First-to-Spike Policies for Neuromorphic Control Using Policy Gradients

机译：使用策略梯度学习神经形态控制的先到先得策略

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Artificial Neural Networks (ANNs) are currently being used as function approximators in many state-of-the-art Reinforcement Learning (RL) algorithms. Spiking Neural Networks (SNNs) have been shown to drastically reduce the energy consumption of ANNs by encoding information in sparse temporal binary spike streams, hence emulating the communication mechanism of biological neurons. Due to their low energy consumption, SNNs are considered to be important candidates as co-processors to be implemented in mobile devices. In this work, the use of SNNs as stochastic policies is explored under an energy-efficient first-to-spike action rule, whereby the action taken by the RL agent is determined by the occurrence of the first spike among the output neurons. A policy gradient-based algorithm is derived considering a Generalized Linear Model (GLM) for spiking neurons. Experimental results demonstrate the capability of online trained SNNs as stochastic policies to gracefully trade energy consumption, as measured by the number of spikes, and control performance. Significant gains are shown as compared to the standard approach of converting an offline trained ANN into an SNN.

机译：人工神经网络（ANN）当前在许多最新的强化学习（RL）算法中用作函数逼近器。已显示，尖峰神经网络（SNN）通过在稀疏的时间二进制尖峰流中编码信息，从而极大地降低了ANN的能耗，从而模拟了生物神经元的通信机制。由于其低能耗，SNN被认为是要在移动设备中实现的协处理器的重要候选者。在这项工作中，在高能效的第一峰值动作规则下探索了将SNN用作随机策略，从而由RL代理采取的动作由输出神经元中第一峰值的发生来确定。推导了基于策略梯度的算法，其中考虑了用于加标神经元的广义线性模型（GLM）。实验结果表明，在线训练的SNN具有作为随机策略的能力，可以根据尖峰次数和控制性能来合理地权衡能耗。与将脱机训练的ANN转换为SNN的标准方法相比，显示了显着的收益。

著录项

来源
《International Workshop on Signal Processing Advances in Wireless Communications》|2019年|1-5|共5页
会议地点
作者
Bleema Rosenfeld; Osvaldo Simeone; Bipin Rajendran;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
function approximation; gradient methods; learning (artificial intelligence); learning systems; neurocontrollers; stochastic processes;

机译：函数逼近;梯度法;学习（人工智能）;学习系统;神经控制器;随机过程;

相似文献

外文文献
中文文献
专利

1. An Algorithm of Policy Gradient Reinforcement Learning with a Fuzzy Controller in Policies [J] . Harukazu Igarashi, Seiji Ishihara International Journal of Artificial Intelligence and Expert Systems (IJAE) . 2013,第1期

机译：策略中带有模糊控制器的策略梯度强化学习算法
2. Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy [J] . van Rooijen J. C., Grondman I., Babuska R. Mechatronics: The Science of Intelligent Machines . 2014,第8期

机译：使用基于价值梯度的策略进行实时运动控制的无学习率强化学习
3. Learning a Dynamic Policy by Using Policy Gradient: Application to Biped Walking [J] . Takamitsu Matsubara, Jun Morimoto, Jun Nakanishi, Systems and Computers in Japan . 2007,第4期

机译：通过使用策略梯度学习动态策略：在Biped步行中的应用
4. Learning First-to-Spike Policies for Neuromorphic Control Using Policy Gradients [C] . Bleema Rosenfeld, Osvaldo Simeone, Bipin Rajendran International Workshop on Signal Processing Advances in Wireless Communications . 2019

机译：使用政策梯度学习用于神经胸控制的第一秒杀政策
5. Policy-Aware Model Learning for Policy Gradient Methods [D] . Abachi, Romina . 2020

机译：政策感知模型学习策略梯度方法
6. Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient [O] . Junjie Cao, Weiwei Liu, Yong Liu, 2020

机译：从演示到具有演变策略梯度的各种方案概括机器人学习
7. Policy Gradient Reinforcement Learning with a Fuzzy Controller for Policy: Decision Making in RoboCup Soccer Small Size League [O] . Masaya SUGIMOTO, Harukazu IGARASHI, Seiji ISHIHARA, 2014

机译：政策模糊控制器的政策梯度加固学习：Robocup足球小型联赛中的决策

Learning First-to-Spike Policies for Neuromorphic Control Using Policy Gradients

摘要

著录项

相似文献

相关主题

期刊订阅