...
首页> 外文期刊>Systems and Computers in Japan >A Pulse Neural Network Reinforcement Learning Algorithm for Partially Observable Markov Decision Processes
【24h】

A Pulse Neural Network Reinforcement Learning Algorithm for Partially Observable Markov Decision Processes

机译:部分可观察的马尔可夫决策过程的脉冲神经网络强化学习算法

获取原文
获取原文并翻译 | 示例
           

摘要

This paper considers learning by a pulse neural network and proposes a new reinforcement learning algorithm focusing on the ability of pulse neuron elements to process time series. The conventional integrator neuron element is modeled in terms of the average firing rate of the biological neuron. But the pulse neuron is a modeling of the input-output relation of the time-series pulse (spike) and the decay of the internal state (internal potential). The application of such neural networks has been considered in recent engineering studies. It is known in particular that a pulse neuron with a high decay rate acts as a coincidence detector. The proposed model combines pulse neuron elements with different decay rates, which facilitates the processing of the time-series input information and the discrimination of fuzzy states in a partially observable Markov decision process. The proposed network is a four-layered feedforward network in which the pulse neuron elements forming the two hidden layers provide a pseudo-representation of the state in the environment. The elements generate a secondary reinforcement signal which results in learning similar to the conventional reinforcement scheme based on the state evaluation function. A computer experiment verifies that the proposed model works effectively in an environment which is strongly partially observable.
机译:本文考虑了通过脉冲神经网络进行学习,并针对脉冲神经元元素处理时间序列的能力提出了一种新的强化学习算法。传统的积分神经元元素是根据生物神经元的平均放电速率建模的。但是脉冲神经元是时序脉冲(尖峰)的输入输出关系和内部状态(内部电势)衰减的模型。在最近的工程研究中已经考虑了这种神经网络的应用。尤其已知的是,具有高衰减率的脉冲神经元用作巧合检测器。所提出的模型结合了具有不同衰减率的脉冲神经元元素,从而有助于在部分可观察的马尔可夫决策过程中处理时间序列输入信息和判别模糊状态。所提出的网络是四层前馈网络,其中形成两个隐藏层的脉冲神经元元素提供了环境中状态的伪表示。这些元件生成第二个增强信号,从而根据状态评估函数类似于常规增强方案进行学习。一个计算机实验验证了所提出的模型在强烈可部分观察的环境中有效地工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号