...
首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >MEC—A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems
【24h】

MEC—A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems

机译:MEC-连续确定性系统的近乎最佳在线强化学习算法

获取原文
获取原文并翻译 | 示例

摘要

In this paper, the first probably approximately correct (PAC) algorithm for continuous deterministic systems without relying on any system dynamics is proposed. It combines the state aggregation technique and the efficient exploration principle, and makes high utilization of online observed samples. We use a grid to partition the continuous state space into different cells to save samples. A near-upper Q operator is defined to produce a near-upper Q function using samples in each cell. The corresponding greedy policy effectively balances between exploration and exploitation. With the rigorous analysis, we prove that there is a polynomial time bound of executing nonoptimal actions in our algorithm. After finite steps, the final policy reaches near optimal in the framework of PAC. The implementation requires no knowledge of systems and has less computation complexity. Simulation studies confirm that it is a better performance than other similar PAC algorithms.
机译:本文提出了一种不依赖任何系统动力学的连续确定性系统的第一种可能近似正确的(PAC)算法。它结合了状态聚合技术和有效的勘探原理,并充分利用了在线观测样本。我们使用网格将连续状态空间划分为不同的单元以保存样本。定义了一个近上Q算子,使用每个单元格中的样本来产生一个近上Q函数。相应的贪婪政策有效地在勘探与开发之间取得了平衡。通过严格的分析,我们证明了在我们的算法中执行多项式动作存在一个多项式时限。经过有限的步骤后,最终政策在PAC框架内达到了接近最佳状态。该实现不需要系统知识,并且具有较少的计算复杂度。仿真研究证实,它比其他类似的PAC算法具有更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号