...
首页> 外文期刊>Naval Research Logistics >Technical Note: A Computationally Efficient Algorithm For Undiscounted Markov Decision Processes With Restricted Observations
【24h】

Technical Note: A Computationally Efficient Algorithm For Undiscounted Markov Decision Processes With Restricted Observations

机译:技术说明:具有有限观测值的无折扣马尔可夫决策过程的一种计算有效算法

获取原文
获取原文并翻译 | 示例
           

摘要

We present a computationally efficient procedure to determine control policies for an infinite horizon Markov Decision process with restricted observations. The optimal policy for the system with restricted observations is a function of the observation process and not the unobservable states of the system. Thus, the policy is stationary with respect to the partitioned state space. The algorithm we propose addresses the undiscounted average cost case. The algorithm combines a local search with a modified version of Howard's (Dynamic programming and Markov processes, MIT Press, Cambridge, MA, 1960) policy iteration method. We demonstrate empirically that the algorithm finds the optimal deterministic policy for over 96% of the problem instances generated. For large scale problem instances, we demonstrate that the average cost associated with the local optimal policy is lower than the average cost associated with an integer rounded policy produced by the algorithm of Serin and Kulkarni Math Methods Oper Res 61 (2005) 311-328.
机译:我们提出了一种计算有效的过程来确定具有受限观测的无限地平线马尔可夫决策过程的控制策略。具有受限观察的系统的最佳策略是观察过程的函数,而不是系统的不可观察状态。因此,该策略相对于分区状态空间是固定的。我们提出的算法解决了未折现平均成本的情况。该算法将本地搜索与霍华德(动态规划和马尔可夫过程,麻省理工学院出版社,剑桥,马萨诸塞州,1960)策略迭代方法的修改版本结合在一起。我们凭经验证明,该算法可为超过96%的问题实例找到最佳确定性策略。对于大规模问题实例,我们证明与局部最优策略相关的平均成本低于与Serin和Kulkarni Math Methods Oper Res 61(2005)311-328算法生成的整数舍入策略相关的平均成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号