A kernel based true online Sarsa(??) for continuous space control problems

Fei Zhu; Haijun Zhu; Yuchen Fu; Xiaoke Zhou

首页> 外文期刊>Computer Science and Information Systems >A kernel based true online Sarsa(??) for continuous space control problems

【24h】

A kernel based true online Sarsa(??) for continuous space control problems

机译：基于内核的真正在线Sarsa（??），用于解决连续空间控制问题

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement learning is an efficient learning method for the control problem by interacting with the environment to get an optimal policy. However, it also faces challenges such as low convergence accuracy and slow convergence. Moreover, conventional reinforcement learning algorithms could hardly solve continuous control problems. The kernel-based method can accelerate convergence speed and improve convergence accuracy; and the policy gradient method is a good way to deal with continuous space problems. We proposed a Sarsa(??) version of true online time difference algorithm, named True Online Sarsa(??)(TOSarsa(??)), on the basis of the clustering-based sample specification method and selective kernelbased value function. The TOSarsa(??) algorithm has a consistent result with both the forward view and the backward view which ensures to get an optimal policy in less time. Afterwards we also combined TOSarsa(??) with heuristic dynamic programming. The experiments showed our proposed algorithm worked well in dealing with continuous control problem.

机译：强化学习是一种通过与环境交互以获得最佳策略来解决控制问题的有效学习方法。但是，它也面临诸如收敛精度低和收敛慢的挑战。此外，传统的强化学习算法几乎无法解决连续控制问题。基于核的方法可以加快收敛速度，提高收敛精度。策略梯度法是解决连续空间问题的好方法。在基于聚类的样本指定方法和基于选择性核的价值函数的基础上，我们提出了真正在线时差算法的Sarsa（??）版本，称为True Online Sarsa（??）（TOSarsa（??））。 TOSarsa（??）算法在前视图和后视图中均具有一致的结果，从而确保在更短的时间内获得最佳策略。之后，我们还将TOSarsa（??）与启发式动态编程结合在一起。实验表明，本文提出的算法在处理连续控制问题上效果很好。

著录项

来源
《Computer Science and Information Systems》 |2017年第3期|共页
作者
Fei Zhu; Haijun Zhu; Yuchen Fu; Xiaoke Zhou;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类图书馆学、图书馆事业;
关键词
reinforcement learningkernel methodtrue onlinepolicy gradientSarsa(??);

机译：强化学习内核方法真正的在线策略梯度;

相似文献

外文文献
中文文献
专利

1. Online Kernel-Based Learning for Task-Space Tracking Robot Control [J] . Nguyen-Tuong D., Peters J. Neural Networks and Learning Systems, IEEE Transactions on . 2012,第9期

机译：基于在线内核的学习，用于任务空间跟踪机器人控制
2. Reproducing Kernel Hilbert Space Approach for the Online Update of Radial Bases in Neuro-Adaptive Control [J] . Kingravi H. A. Neural Networks and Learning Systems, IEEE Transactions on . 2012,第7期

机译：神经自适应控制中的径向基在线更新的再生核希尔伯特空间方法
3. A hybrid switching predictive controller based on bi-level kernel-based ELM and online trajectory builder for automotive coldstart emissions reduction [J] . Azad Nasser L., Mozaffari Ahmad, Hedrick J. Karl Neurocomputing . 2016,第JANa15PTa3期

机译：基于两级基于内核的ELM和在线轨迹构建器的混合切换预测控制器，用于减少汽车冷启动排放
4. The True Online Continuous Learning Automation (TOCLA) in a continuous control benchmarking of actor-critic algorithms [C] . Gordon Frost, Marta Vallejo IEEE Symposium Series on Computational Intelligence . 2020

机译：演员 - 评论家算法的连续控制基准中的真实在线连续学习自动化（TOCLA）
5. Kernel-Controlled DQN Based CNN Pruning for Model Compression and Acceleration [D] . Khatri, Romancha. 2020

机译：基于内核控制的DQN基于模型压缩和加速的CNN修剪
6. Multi-packet transmission aero-engine DCS neural network sliding mode control based on multi-kernel LS-SVM packet dropout online compensation [O] . Li Guangfu, Wang Xu, Ren Jia, 2020

机译：多包传输航空发动机DCS神经网络滑动模式控制基于多核LS-SVM数据包丢弃在线补偿
7. A kernel based true online Sarsa(λ) for continuous space control problems [O] . Fei Zhu, Haijun Zhu, Yuchen Fu, 2017

机译：基于内核的真正在线Sarsa（λ），用于连续空间控制问题

A kernel based true online Sarsa(??) for continuous space control problems

摘要

著录项

相似文献

相关主题

期刊订阅