首页> 外文OA文献 >Reward-driven Training of Random Boolean Network Reservoirs for Model-Free Environments
【2h】

Reward-driven Training of Random Boolean Network Reservoirs for Model-Free Environments

机译:无模型环境下随机布尔网络储层的奖励驱动训练

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Reservoir Computing (RC) is an emerging machine learning paradigm where a fixed kernel, built from a randomly connected u22reservoiru22 with sufficiently rich dynamics, is capable of expanding the problem space in a non-linear fashion to a higher dimensional feature space. These features can then be interpreted by a linear readout layer that is trained by a gradient descent method. In comparison to traditional neural networks, only the output layer needs to be trained, which leads to a significant computational advantage. In addition, the short term memory of the reservoir dynamics has the ability to transform a complex temporal input state space to a simple non-temporal representation. Adaptive real-time systems are multi-stage decision problems that can be used to train an agent to achieve a preset goal by performing an optimal action at each timestep. In such problems, the agent learns through continuous interactions with its environment. Conventional techniques to solving such problems become computationally expensive or may not converge if the state-space being considered is large, partially observable, or if short term memory is required in optimal decision making. The objective of this thesis is to use reservoir computers to solve such goal-driven tasks, where no error signal can be readily calculated to apply gradient descent methodologies. To address this challenge, we propose a novel reinforcement learning approach in combination with reservoir computers built from simple Boolean components. Such reservoirs are of interest because they have the potential to be fabricated by self-assembly techniques. We evaluate the performance of our approach in both Markovian and non-Markovian environments. We compare the performance of an agent trained through traditional Q-Learning. We find that the reservoir-based agent performs successfully in these problem contexts and even performs marginally better than Q-Learning agents in certain cases. Our proposed approach allows to retain the advantage of traditional parameterized dynamic systems in successfully modeling embedded state-space representations while eliminating the complexity involved in training traditional neural networks. To the best of our knowledge, our method of training a reservoir readout layer through an on-policy boot-strapping approach is unique in the field of random Boolean network reservoirs.
机译:储层计算(RC)是一种新兴的机器学习范例,其中,固定的内核由具有足够丰富的动态特性的随机连接的,可以将问题空间以非线性方式扩展到更高维度的特征空间。然后,可以通过由梯度下降法训练的线性读出层来解释这些特征。与传统的神经网络相比,仅需要训练输出层,这将带来显着的计算优势。另外,储层动力学的短期记忆具有将复杂的时间输入状态空间转换为简单的非时间表示的能力。自适应实时系统是多阶段决策问题,可用于通过在每个时间步执行最佳操作来训练代理以实现预设目标。在此类问题中,主体通过与其环境的持续交互来学习。如果要考虑的状态空间很大,部分可观察到,或者在最佳决策中需要短期存储,则解决此类问题的常规技术在计算上会变得昂贵或无法收敛。本文的目的是使用储层计算机来解决此类目标驱动的任务,在这些任务中,无法轻松计算出误差信号以应用梯度下降方法。为了应对这一挑战,我们提出了一种新颖的强化学习方法,并结合了由简单布尔组件构建的储层计算机。这样的储存器是令人感兴趣的,因为它们具有通过自组装技术制造的潜力。我们评估了我们的方法在马尔可夫和非马尔可夫环境中的性能。我们比较了通过传统Q学习培训的座席的表现。我们发现,在某些情况下,基于存储库的代理可以成功执行,甚至比Q学习代理要好。我们提出的方法允许在成功地对嵌入式状态空间表示进行建模的同时保留传统参数化动态系统的优势,同时消除了训练传统神经网络所涉及的复杂性。据我们所知,在随机布尔网络存储库领域中,通过策略引导引导方法训练存储库读出层的方法是独一无二的。

著录项

  • 作者

    Gargesa Padmashri;

  • 作者单位
  • 年度 2013
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号