【24h】

Sparse Gradient-Based Direct Policy Search

机译:基于稀疏梯度的直接策略搜索

获取原文

摘要

Reinforcement learning is challenging if state and action spaces are continuous. The discretization of state and action spaces and real-time adaptation of the discretization are critical issues in reinforcement learning problems. In our contribution we consider the adaptive discretization, and introduce a sparse gradient-based direct policy search method. We address the issue of efficient states/actions selection in the gradient-based direct policy search based on imposing sparsity through the L_1 penalty term. We propose to start learning with a fine discretization of state space and to induce sparsity via the L_1 norm. We compare the proposed approach to state-of-the art methods, such as progressive widening Q-learning which updates the discretization of the states adaptively, and to classic as well as sparse Q-learning with linear function approximation. We demonstrate by our experiments on standard reinforcement learning challenges that the proposed approach is efficient.
机译:如果状态和动作空间是连续的,则强化学习将具有挑战性。状态和动作空间的离散化以及离散化的实时适应是强化学习问题中的关键问题。在我们的贡献中,我们考虑了自适应离散化,并介绍了一种基于稀疏梯度的直接策略搜索方法。我们基于通过L_1惩罚项施加稀疏性,解决了基于梯度的直接策略搜索中有效状态/动作选择的问题。我们建议从状态空间的精细离散开始学习,并通过L_1范数诱导稀疏性。我们将提出的方法与最新方法进行了比较,例如渐进式扩展Q学习可自适应更新状态的离散化,以及经典方法和稀疏Q学习(具有线性函数逼近)。我们通过对标准强化学习挑战的实验证明了所提出的方法是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号