首页> 外文会议>MICAI 2009: Advances in artificial intelligence >Cosine Policy Iteration for Solving Infinite-Horizon Markov Decision Processes
【24h】

Cosine Policy Iteration for Solving Infinite-Horizon Markov Decision Processes

机译:求解无限视野马尔可夫决策过程的余弦策略迭代

获取原文
获取原文并翻译 | 示例

摘要

Police Iteration (PI) is a widely used traditional method for solving Markov Decision Processes (MDPs). In this paper, the cosine policy iteration (CPI) method for solving complex problems formulated as infinite-horizon MDPs is proposed. CPI combines the advantages of two methods: i) Cosine Simplex Method (CSM) which is based on the Karush, Kuhn, and Tucker (KKT) optimality conditions and finds rapidly an initial policy close to the optimal solution and ii) PI which is able to achieve the global optimum. In order to apply CSM to this kind of problems, a well- known LP formulation is applied and particular features are derived in this paper. Obtained results show that the application of CPI solves MDPs in a lower number of iterations that the traditional PI.
机译:警察迭代(PI)是解决马尔可夫决策过程(MDP)的一种广泛使用的传统方法。本文提出了用余弦策略迭代(CPI)方法解决无限水平MDP形式的复杂问题。 CPI结合了两种方法的优点:i)基于Karush,Kuhn和Tucker(KKT)最优性条件的余弦单纯形法(CSM),可以迅速找到接近最优解的初始策略,以及ii)PI达到全局最优。为了将CSM应用于此类问题,本文采用了众所周知的LP公式,并推导了其特殊功能。所得结果表明,CPI的应用解决了MPI的迭代次数少于传统PI的迭代次数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号