首页> 外文期刊>Computer Science and Information Systems >A kernel based true online Sarsa(??) for continuous space control problems
【24h】

A kernel based true online Sarsa(??) for continuous space control problems

机译:基于内核的真正在线Sarsa(??),用于解决连续空间控制问题

获取原文
           

摘要

Reinforcement learning is an efficient learning method for the control problem by interacting with the environment to get an optimal policy. However, it also faces challenges such as low convergence accuracy and slow convergence. Moreover, conventional reinforcement learning algorithms could hardly solve continuous control problems. The kernel-based method can accelerate convergence speed and improve convergence accuracy; and the policy gradient method is a good way to deal with continuous space problems. We proposed a Sarsa(??) version of true online time difference algorithm, named True Online Sarsa(??)(TOSarsa(??)), on the basis of the clustering-based sample specification method and selective kernelbased value function. The TOSarsa(??) algorithm has a consistent result with both the forward view and the backward view which ensures to get an optimal policy in less time. Afterwards we also combined TOSarsa(??) with heuristic dynamic programming. The experiments showed our proposed algorithm worked well in dealing with continuous control problem.
机译:强化学习是一种通过与环境交互以获得最佳策略来解决控制问题的有效学习方法。但是,它也面临诸如收敛精度低和收敛慢的挑战。此外,传统的强化学习算法几乎无法解决连续控制问题。基于核的方法可以加快收敛速度​​,提高收敛精度。策略梯度法是解决连续空间问题的好方法。在基于聚类的样本指定方法和基于选择性核的价值函数的基础上,我们提出了真正在线时差算法的Sarsa(??)版本,称为True Online Sarsa(??)(TOSarsa(??))。 TOSarsa(??)算法在前视图和后视图中均具有一致的结果,从而确保在更短的时间内获得最佳策略。之后,我们还将TOSarsa(??)与启发式动态编程结合在一起。实验表明,本文提出的算法在处理连续控制问题上效果很好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号