首页> 外文期刊>Neurocomputing >Experiments of conditioned reinforcement learning in continuous space control tasks
【24h】

Experiments of conditioned reinforcement learning in continuous space control tasks

机译:连续空间控制任务中条件强化学习的实验

获取原文
获取原文并翻译 | 示例

摘要

AbstractThe key issue that prevents application of Reinforcement Learning (RL) methods in complex control scenarios is lack of convergence to meaningful decision policies (i.e. policies that differ significatively from random decisions), due to the huge state-action spaces to be explored. Providing the agent with initial domain knowledge alleviates this problem. This is known as Conditioned RL (CRL). In high-dimensional continuous state-action space and reward domains, CRL is often the only feasible approach to reach meaningful decision policies. In these kind of systems, RL is carried out by Actor-Critic approaches, and the state-action value functionals are modeled by Value Function Approximations (VFA). CRL methods make use of an existing reference controller, i.e. the teacher controller, which provides the initial domain knowledge to the agent under training. The teacher-controller can be used in two ways to build the VFA of the state-action value and state transition functions which determine the action selection policy: (1) providing the desired output for a supervised learning process, or (2) directly using it to build them. We have carried out experiments to compare CRL methods, and unconditioned Actor-Critic agents in three different control benchmark scenarios. Results show that both agent conditioning approaches result in significant performance improvements. Undertight computational time constraints, CRL approaches were able to learn efficient policies, while the unconditioned agents were not able to find any acceptable policy in the benchmark control scenarios.
机译: 摘要 阻止在复杂控制方案中应用强化学习(RL)方法的关键问题是缺乏对有意义的决策策略(即与随机决策显着不同的策略)的融合,这是由于要探索的巨大国家行动空间。向代理提供初始领域知识可以缓解此问题。这称为条件RL(CRL)。在高维连续状态动作空间和奖励域中,CRL通常是达成有意义的决策策略的唯一可行方法。在这类系统中,RL是通过Actor-Critic方法执行的,而状态作用值功能则由值函数近似(VFA)建模。 CRL方法使用现有的参考控制器(即教师控制器),该控制器为受训代理提供初始领域知识。可以通过两种方式使用教师-控制器来构建决定行为选择策略的状态行为值和状态转换函数的VFA:(1)为监督学习过程提供所需的输出,或者(2)直接使用它来建立他们。我们已经进行了实验,以比较CRL方法和三种不同控制基准场景中的无条件Actor-Critic代理。结果表明,两种代理调理方法都可以显着改善性能。在不受时间限制的情况下,CRL方法能够学习有效的策略,而无条件的代理则无法在基准控制方案中找到任何可接受的策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号