首页> 外文期刊>Journal of supercomputing >Towards a Multiple-Lookahead-Levels agent reinforcement-learning technique and its implementation in integrated circuits
【24h】

Towards a Multiple-Lookahead-Levels agent reinforcement-learning technique and its implementation in integrated circuits

机译:迈向多层高级代理强化学习技术及其在集成电路中的实现

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Reinforcement learning (RL) techniques have contributed and continue to tremendously contribute to the advancement of machine learning and its many related recent applications. As it is well known, some of the main limitations of existing RL techniques are, in general, their slow convergence and their computational complexity. The contributions of this paper are two-fold: (1) First, it introduces a technique for reinforcement learning using multiple lookahead levels that grants an autonomous agent more visibility in its environment and helps it learn faster. This technique extends the Watkins's Q-Learning algorithm by using the Multiple-Lookahead-Levels (MLL) model equation that we develop and present here. An analysis of the convergence of the MLL equation and proof of its effectiveness are performed. A method to compute the improvement rate of the agent's learning speed between different look-ahead levels is also proposed and implemented. Here, both the time and space complexities are examined. Results show that the number of steps, required to achieve the goal, per learning path exponentially decreases with the learning path number (time). Results also show that the number of steps per learning path, to some degree, is less at any time when the number of look-ahead levels is higher (space). Furthermore, we perform the analysis of the MLL system in the time domain and prove its temporal stability using Lyapunov theory. (2) Second, based on this Lyapunov stability analysis, we subsequently, and for the first time, propose a circuit architecture for the MLL technique's software configurable hardware system design for real-time applications.
机译:强化学习(RL)技术为机器学习及其许多相关的最近应用的发展做出了贡献,并继续做出巨大贡献。众所周知,现有RL技术的一些主要局限性通常是它们的收敛速度慢和计算复杂性。本文的贡献有两个方面:(1)首先,它介绍了一种使用多个前瞻级别进行强化学习的技术,该技术为自治代理在其环境中提供了更高的可见性并帮助其更快地学习。这项技术通过使用我们在此处开发和提出的多重可疑水平(MLL)模型方程式,扩展了Watkins的Q学习算法。对MLL方程的收敛性进行了分析并证明了其有效性。还提出并实现了一种计算不同预见水平之间的智能体学习速度提高率的方法。在这里,时间和空间的复杂性都得到了检验。结果表明,每个学习路径达到目标所需的步骤数随学习路径数(时间)呈指数下降。结果还表明,当预读级别的数量较高(空间)时,每个学习路径的步数在一定程度上都会减少。此外,我们在时域上对MLL系统进行分析,并使用Lyapunov理论证明其时间稳定性。 (2)其次,基于此Lyapunov稳定性分析,我们随后并首次为MLL技术的实时应用软件可配置硬件系统设计提出了一种电路架构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号