首页> 外文期刊>Soft Computing - A Fusion of Foundations, Methodologies and Applications >Continuous-action reinforcement learning with fast policy search and adaptive basis function selection
【24h】

Continuous-action reinforcement learning with fast policy search and adaptive basis function selection

机译:具有快速策略搜索和自适应基函数选择的连续动作强化学习

获取原文
获取原文并翻译 | 示例

摘要

As an important approach to solving complex sequential decision problems, reinforcement learning (RL) has been widely studied in the community of artificial intelligence and machine learning. However, the generalization ability of RL is still an open problem and it is difficult for existing RL algorithms to solve Markov decision problems (MDPs) with both continuous state and action spaces. In this paper, a novel RL approach with fast policy search and adaptive basis function selection, which is called Continuous-action Approximate Policy Iteration (CAPI), is proposed for RL in MDPs with both continuous state and action spaces. In CAPI, based on the value functions estimated by temporal-difference learning, a fast policy search technique is suggested to search for optimal actions in continuous spaces, which is computationally efficient and easy to implement. To improve the generalization ability and learning efficiency of CAPI, two adaptive basis function selection methods are developed so that sparse approximation of value functions can be obtained efficiently both for linear function approximators and kernel machines. Simulation results on benchmark learning control tasks with continuous state and action spaces show that the proposed approach not only can converge to a near-optimal policy in a few iterations but also can obtain comparable or even better performance than Sarsa-learning, and previous approximate policy iteration methods such as LSPI and KLSPI.
机译:作为解决复杂的顺序决策问题的重要方法,强化学习(RL)已在人工智能和机器学习领域得到了广泛的研究。但是,RL的泛化能力仍然是一个未解决的问题,现有的RL算法很难解决具有连续状态空间和动作空间的Markov决策问题(MDP)。本文针对具有连续状态和动作空间的MDP中的RL,提出了一种具有快速策略搜索和自适应基函数选择的RL方法,称为连续动作近似策略迭代(CAPI)。在CAPI中,基于通过时差学习估计的值函数,提出了一种快速策略搜索技术来搜索连续空间中的最佳动作,该算法计算效率高且易于实现。为了提高CAPI的泛化能力和学习效率,开发了两种自适应基函数选择方法,以便对于线性函数逼近器和内核机都可以有效地获得值函数的稀疏近似。具有连续状态和动作空间的基准学习控制任务的仿真结果表明,所提出的方法不仅可以在几次迭代中收敛到接近最优的策略,而且可以获得比Sarsa学习和以前的近似策略可比甚至更好的性能。迭代方法,例如LSPI和KLSPI。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号