首页> 外文会议>International Joint Conference on Neural Networks >Continuous-time on-policy neural reinforcement learning of working memory tasks
【24h】

Continuous-time on-policy neural reinforcement learning of working memory tasks

机译:工作记忆任务的连续持续时间内神经强化学习

获取原文

摘要

As living organisms, one of our primary characteristics is the ability to rapidly process and react to unknown and unexpected events. To this end, we are able to recognize an event or a sequence of events and learn to respond properly. Despite advances in machine learning, current cognitive robotic systems are not able to rapidly and efficiently respond in the real world: the challenge is to learn to recognize both what is important, and also when to act. Reinforcement Learning (RL) is typically used to solve complex tasks: to learn the how. To respond quickly - to learn when - the environment has to be sampled often enough. For "enough", a programmer has to decide on the step-size as a time-representation, choosing between a fine-grained representation of time (many state-transitions; difficult to learn with RL) or to a coarse temporal resolution (easier to learn with RL but lacking precise timing). Here, we derive a continuous-time version of on-policy SARSA-learning in a working-memory neural network model, AuGMEnT. Using a neural working memory network resolves the what problem, our when solution is built on the notion that in the real world, instantaneous actions of duration dt are actually impossible. We demonstrate how we can decouple action duration from the internal time-steps in the neural RL model using an action selection system. The resultant CT-AuGMEnT successfully learns to react to the events of a continuous-time task, without any pre-imposed specifications about the duration of the events or the delays between them.
机译:作为生物体,我们的主要特征之一是能够快速处理和对未知和意外事件作出反应的能力。为此,我们能够识别事件或一系列事件,并学会正确响应。尽管机器学习进展,但目前的认知机器人系统无法在现实世界中快速有效地回应:挑战是学会认识到两个重要的东西,以及何时采取行动。强化学习(RL)通常用于解决复杂的任务:学习如何。要快速响应 - 学习何时 - 必须经常对环境进行采样。对于“足够的”,程序员必须根据时间表示来决定阶梯大小,选择微粒的时间表示(许多状态转换;难以学习RL)或粗略的时间分辨率(更容易学习RL但缺乏精确的时机)。在这里,我们在工作记忆神经网络模型中导出了on-policy sarsa-learning的连续时间版本,增强。使用神经工作内存网络解决了哪些问题,我们在解决方案的内在现实世界中的概念时,持续时间DT的瞬时动作实际上是不可能的。我们展示了如何使用动作选择系统从神经RL模型中的内部时间步骤中分离行动持续时间。结果CT-Augment成功学习对连续时间任务的事件作出反应,而无需任何关于事件持续时间的预先施加的规范或它们之间的延迟。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号