...
首页> 外文期刊>Journal of Harbin Institute of Technology >Reinforcement learning with partitioning function system
【24h】

Reinforcement learning with partitioning function system

机译:带分区功能系统的强化学习

获取原文
获取原文并翻译 | 示例
           

摘要

The size of state-space is the limiting factor in applying reinforcement learning algorithms to practical cases. A reinforcement learning system with partitioning function (RLWPF) is established, in which state-space is partitioned into several regions. Inside the performance principle of RLWPF is based on a Semi-Markov decision process and has general significance. It can be applied to any reinforcement learning with a large state-space. In RLWPF, the partitioning module dispatches agents into different regions in order to decrease the state-space of each agent. This article proves the convergence of the SARSA algorithm for a Semi-Markov decision process, ensuring the convergence of RLWPF by analyzing the equivalence of two value functions in two Semi-Markov decision processes before and after partitioning. This article can show that the optimal policy learned by RLWPF is consistent with prior domain knowledge. An elevator group system is devised to decrease the average waiting time of passengers. Four agents control four elevator cars respectively. Based on RLWPF, a partitioning module is developed through defining a uniform round trip time as the partitioning criteria, making the wait time of most passengers more or less identical then elevator cars should only answer hall calls in their own region. Compared with ordinary elevator systems and reinforcement learning systems without partitioning module, the performance results show the advantage of RLWPF.
机译:状态空间的大小是将强化学习算法应用于实际案例的限制因素。建立了具有分区功能的强化学习系统(RLWPF),其中状态空间被划分为多个区域。 RLWPF的性能原理内部基于Semi-Markov决策过程,具有一般意义。它可以应用于具有较大状态空间的任何强化学习。在RLWPF中,分区模块将代理分配到不同的区域,以减少每个代理的状态空间。本文通过分析分割前后两个Semi-Markov决策过程中两个值函数的等价性,证明了SARSA算法在Semi-Markov决策过程中的收敛性,从而确保RLWPF的收敛性。本文可以证明RLWPF学习的最佳策略与先前的领域知识是一致的。设计了电梯群系统以减少乘客的平均等待时间。四个代理商分别控制四个电梯轿厢。基于RLWPF,通过将统一的往返时间定义为划分标准来开发划分模块,从而使大多数乘客的等待时间或多或少是相同的,因此电梯轿厢仅应在自己区域内应答门厅呼叫。与没有分区模块的普通电梯系统和加固学习系统相比,性能结果表明了RLWPF的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号