首页> 外文期刊>Cybernetics, IEEE Transactions on >Policy Search for the Optimal Control of Markov Decision Processes: A Novel Particle-Based Iterative Scheme
【24h】

Policy Search for the Optimal Control of Markov Decision Processes: A Novel Particle-Based Iterative Scheme

机译:Markov决策过程的最优控制的策略搜索:一种新型的基于粒子的迭代方案

获取原文
获取原文并翻译 | 示例
       

摘要

Classical approximate dynamic programming techniques based on state-space gridding become computationally impracticable for high-dimensional problems. Policy search techniques cope with this curse of dimensionality issue by searching for the optimal control policy in a restricted parameterized policy space. We here focus on the case of discrete action space and introduce a novel policy parametrization that adopts particles to describe the map from the state space to the action space, each particle representing a region of the state space that is mapped into a certain action. The locations and actions associated with the particles describing a policy can be tuned by means of a recently introduced policy gradient method with parameter-based exploration. The task of selecting an appropriately sized set of particles is here solved through an iterative policy building scheme that adds new particles to improve the policy performance and is also capable of removing redundant particles. Experiments demonstrate the scalability of the proposed approach as the dimensionality of the state-space grows.
机译:对于高维问题,基于状态空间网格的经典近似动态编程技术在计算上变得不切实际。策略搜索技术通过在受限的参数化策略空间中搜索最佳控制策略来应对这种维数问题。我们在这里关注离散动作空间的情况,并介绍一种新颖的策略参数化,该策略参数化采用粒子来描述从状态空间到动作空间的映射,每个粒子代表状态空间中映射到某个动作的区域。与描述策略的粒子相关联的位置和动作可以通过最近引入的基于参数的探索的策略梯度方法进行调整。在这里,通过迭代的策略构建方案解决了选择适当大小的一组粒子的任务,该策略添加了新的粒子以提高策略性能,并且还能够删除多余的粒子。实验表明,随着状态空间维数的增长,该方法具有一定的可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号