MEC—A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems

Zhao D.; Zhu Y.

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >MEC—A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems

【24h】

MEC—A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems

机译：MEC-连续确定性系统的近乎最佳在线强化学习算法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, the first probably approximately correct (PAC) algorithm for continuous deterministic systems without relying on any system dynamics is proposed. It combines the state aggregation technique and the efficient exploration principle, and makes high utilization of online observed samples. We use a grid to partition the continuous state space into different cells to save samples. A near-upper Q operator is defined to produce a near-upper Q function using samples in each cell. The corresponding greedy policy effectively balances between exploration and exploitation. With the rigorous analysis, we prove that there is a polynomial time bound of executing nonoptimal actions in our algorithm. After finite steps, the final policy reaches near optimal in the framework of PAC. The implementation requires no knowledge of systems and has less computation complexity. Simulation studies confirm that it is a better performance than other similar PAC algorithms.

机译：本文提出了一种不依赖任何系统动力学的连续确定性系统的第一种可能近似正确的（PAC）算法。它结合了状态聚合技术和有效的勘探原理，并充分利用了在线观测样本。我们使用网格将连续状态空间划分为不同的单元以保存样本。定义了一个近上Q算子，使用每个单元格中的样本来产生一个近上Q函数。相应的贪婪政策有效地在勘探与开发之间取得了平衡。通过严格的分析，我们证明了在我们的算法中执行多项式动作存在一个多项式时限。经过有限的步骤后，最终政策在PAC框架内达到了接近最佳状态。该实现不需要系统知识，并且具有较少的计算复杂度。仿真研究证实，它比其他类似的PAC算法具有更好的性能。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on》 |2015年第2期|346-356|共11页
作者
Zhao D.; Zhu Y.;
展开▼
作者单位

State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Algorithm design and analysis; Approximation algorithms; Heuristic algorithms; Learning systems; Partitioning algorithms; Polynomials; Upper bound; Efficient exploration; probably approximately correct (PAC); reinforcement learning (RL); state aggregation; state aggregation.;

机译：算法设计与分析;逼近算法;启发式算法;学习系统;分区算法;多项式;上限;有效探索;大概近似（PAC）;强化学习（RL）;状态聚合;状态聚合;

相似文献

外文文献
中文文献
专利

1. Online concurrent reinforcement learning algorithm to solve two-player zero-sum games for partially unknown nonlinear continuous-time systems [J] . Yasini Sholeh, Karimpour Ali, Sistani Mohammad-Bagher Naghibi, International Journal of Adaptive Control and Signal Processing . 2015,第4期

机译：在线并发强化学习算法，用于求解部分未知的非线性连续时间系统的两人零和游戏
2. Joint computation offloading and task caching for multi-user and multi-task MEC systems: reinforcement learning-based algorithms [J] . Elgendy Ibrahim A., Zhang Wei-Zhe, He Hui, Wireless Networks . 2021,第3期

机译：用于多用户和多任务MEC系统的联合计算卸载和任务缓存：基于强化学习的算法
3. Online reinforcement learning for a class of partially unknown continuous‐time nonlinear systems via value iteration [J] . Su Hanguang, Zhang Huaguang, Zhang Kun, Optimal Control Applications and Methods . 2018,第2期

机译：通过价值迭代为一类部分未知的连续时间非线性系统进行在线加固
4. An high-efficient online reinforcement learning algorithm for continuous-state systems [C] . Yuanheng Zhu, Dongbin Zhao, Haibo He World Congress on Intelligent Control and Automation . 2014

机译：连续状态系统的高效在线强化学习算法
5. Multi-agent systems: Integrating reinforcement learning, bidding and genetic algorithms. [D] . Qi, Dehu. 2002

机译：多主体系统：集成强化学习，投标和遗传算法。
6. Near-optimal deterministic algorithms for volume computation via M-ellipsoids [O] . Daniel Dadush, Santosh S. Vempala 2013

机译：通过M椭球体进行体积计算的近最佳确定性算法
7. Faster Near-Optimal Reinforcement Learning: Adding Adaptiveness to the E³ Algorithm [O] . Carlos Domingo 1999

机译：更快的近乎最佳的强化学习：为E³算法增加适应性

MEC—A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅