首页> 外文会议>International Joint Conference on Neural Networks >Continuous-time on-policy neural Reinforcement Learning of working memory tasks
【24h】

Continuous-time on-policy neural Reinforcement Learning of working memory tasks

机译:连续时间的策略性神经强化学习工作记忆任务

获取原文

摘要

As living organisms, one of our primary characteristics is the ability to rapidly process and react to unknown and unexpected events. To this end, we are able to recognize an event or a sequence of events and learn to respond properly. Despite advances in machine learning, current cognitive robotic systems are not able to rapidly and efficiently respond in the real world: the challenge is to learn to recognize both what is important, and also when to act. Reinforcement Learning (RL) is typically used to solve complex tasks: to learn the how. To respond quickly - to learn when - the environment has to be sampled often enough. For “enough”, a programmer has to decide on the step-size as a time-representation, choosing between a fine-grained representation of time (many state-transitions; difficult to learn with RL) or to a coarse temporal resolution (easier to learn with RL but lacking precise timing). Here, we derive a continuous-time version of on-policy SARSA-learning in a working-memory neural network model, AuGMEnT. Using a neural working memory network resolves the what problem, our when solution is built on the notion that in the real world, instantaneous actions of duration dt are actually impossible. We demonstrate how we can decouple action duration from the internal time-steps in the neural RL model using an action selection system. The resultant CT-AuGMEnT successfully learns to react to the events of a continuous-time task, without any pre-imposed specifications about the duration of the events or the delays between them.
机译:作为活生物体,我们的主要特征之一是能够快速处理并对未知和意外事件做出反应的能力。为此,我们能够识别一个事件或一系列事件,并学会正确地做出响应。尽管机器学习取得了进步,但当前的认知机器人系统仍无法在现实世界中快速有效地做出响应:挑战在于学会认识到重要的内容和行动的时机。强化学习(RL)通常用于解决复杂的任务:学习方法。快速响应-了解时间-必须经常对环境进行采样。对于“足够”,程序员必须决定步长作为时间表示,在时间的细粒度表示(许多状态转换;很难用RL学习)之间选择,或者在较粗的时间分辨率之间选择(较容易)。学习RL,但缺乏精确的时间安排)。在这里,我们在工作记忆神经网络模型AuGMEnT中推导了基于策略的SARSA学习的连续时间版本。使用神经工作记忆网络解决了什么问题,我们的何时解决方案基于以下概念:在现实世界中,持续时间dt的瞬时动作实际上是不可能的。我们演示了如何使用动作选择系统将动作持续时间与神经RL模型中的内部时间步骤分离。生成的CT-AuGMEnT成功地学习了对连续时间任务的事件的反应,而没有任何预先规定的事件持续时间或事件之间的延迟的说明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号