Continuous-time on-policy neural Reinforcement Learning of working memory tasks

机译：连续时间的策略性神经强化学习工作记忆任务

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

As living organisms, one of our primary characteristics is the ability to rapidly process and react to unknown and unexpected events. To this end, we are able to recognize an event or a sequence of events and learn to respond properly. Despite advances in machine learning, current cognitive robotic systems are not able to rapidly and efficiently respond in the real world: the challenge is to learn to recognize both what is important, and also when to act. Reinforcement Learning (RL) is typically used to solve complex tasks: to learn the how. To respond quickly - to learn when - the environment has to be sampled often enough. For “enough”, a programmer has to decide on the step-size as a time-representation, choosing between a fine-grained representation of time (many state-transitions; difficult to learn with RL) or to a coarse temporal resolution (easier to learn with RL but lacking precise timing). Here, we derive a continuous-time version of on-policy SARSA-learning in a working-memory neural network model, AuGMEnT. Using a neural working memory network resolves the what problem, our when solution is built on the notion that in the real world, instantaneous actions of duration dt are actually impossible. We demonstrate how we can decouple action duration from the internal time-steps in the neural RL model using an action selection system. The resultant CT-AuGMEnT successfully learns to react to the events of a continuous-time task, without any pre-imposed specifications about the duration of the events or the delays between them.

机译：作为活生物体，我们的主要特征之一是能够快速处理并对未知和意外事件做出反应的能力。为此，我们能够识别一个事件或一系列事件，并学会正确地做出响应。尽管机器学习取得了进步，但当前的认知机器人系统仍无法在现实世界中快速有效地做出响应：挑战在于学会认识到重要的内容和行动的时机。强化学习（RL）通常用于解决复杂的任务：学习方法。快速响应-了解时间-必须经常对环境进行采样。对于“足够”，程序员必须决定步长作为时间表示，在时间的细粒度表示（许多状态转换;很难用RL学习）之间选择，或者在较粗的时间分辨率之间选择（较容易）。学习RL，但缺乏精确的时间安排）。在这里，我们在工作记忆神经网络模型AuGMEnT中推导了基于策略的SARSA学习的连续时间版本。使用神经工作记忆网络解决了什么问题，我们的何时解决方案基于以下概念：在现实世界中，持续时间dt的瞬时动作实际上是不可能的。我们演示了如何使用动作选择系统将动作持续时间与神经RL模型中的内部时间步骤分离。生成的CT-AuGMEnT成功地学习了对连续时间任务的事件的反应，而没有任何预先规定的事件持续时间或事件之间的延迟的说明。

著录项

来源
《International Joint Conference on Neural Networks》|2015年|1-8|共8页
会议地点
作者
Zambrano Davide; Roelfsema Pieter R.; Bohte Sander M.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Adaptive coordination of working-memory and reinforcement learning in non-human primates performing a trial-and-error problem solving task [J] . Viejo Guillaume, Girard Benoit, Procyk Emmanuel, Behavioural Brain Research: An International Journal . 2018,第期

机译：在非人类灵长类动物中执行试验和错误问题解决任务的自适应协调
2. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis [J] . CollinsA.G.E., FrankM.J. The European Journal of Neuroscience . 2012,第7a8期

机译：强化学习中有多少是工作记忆而不是强化学习？行为，计算和神经遗传学分析
3. Neural substrates of successful working memory and long-term memory formation in a relational spatial memory task [J] . Bergmann Heiko C., Daselaar Sander M., Fernandez Guillen, Cognitive processing . 2016,第4期

机译：在关系空间记忆任务中成功工作记忆和长期记忆形成的神经基础
4. Continuous-time on-policy neural Reinforcement Learning of working memory tasks [C] . Zambrano Davide, Roelfsema Pieter R., Bohte Sander M. International Joint Conference on Neural Networks . 2015

机译：工作记忆任务的连续持续时间内神经强化学习
5. The neural systems of working memory: The Sternberg working memory task in a pediatric traumatic brain injury sample. [D] . Pertab, Jon. 2010

机译：工作记忆的神经系统：儿科创伤性脑损伤样本中的Sternberg工作记忆任务。
6. How much of reinforcement learning is working memory not reinforcement learning? A behavioral computational and neurogenetic analysis [O] . Anne G. E. Collins, Michael J. Frank -1

机译：钢筋学习多少是工作记忆而不是加强学习？行为计算和神经肝分析
7. Continuous-time on-policy neural reinforcement learning of working memory tasks [O] . Zambrano, Davide, Roelfsema, P.R., Bohte, Sander 2015

机译：工作记忆任务的连续时间按策略进行神经强化学习

Continuous-time on-policy neural Reinforcement Learning of working memory tasks

摘要

著录项

相似文献

相关主题

期刊订阅