...
首页> 外文期刊>Automatica >Actor-critic algorithms for hierarchical Markov decision processes
【24h】

Actor-critic algorithms for hierarchical Markov decision processes

机译:层次马尔可夫决策过程的参与者评论算法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We consider the problem of control of hierarchical Markov decision processes and develop a simulation based two-timescale actor-critic algorithm in a general framework. We also develop certain approximation algorithms that require less computation and satisfy a performance bound. One of the approximation algorithms is a three-timescale actor-critic algorithm while the other is a two-timescale algorithm, however, which operates in two separate stages. All our algorithms recursively update randomized policies using the simultaneous perturbation stochastic approximation (SPSA) methodology. We briefly present the convergence analysis of our algorithms. We then present numerical experiments on a problem of production planning in semiconductor fabs on which we compare the performance of all algorithms to-ether with policy iteration. Algorithms based on certain Hadamard matrix based deterministic perturbations are found to show the best results. (c) 2006 Elsevier Ltd. All rights reserved.
机译:我们考虑了分层马尔可夫决策过程的控制问题,并在通用框架下开发了基于仿真的两时尺度参与者评论算法。我们还开发了某些近似算法,这些算法需要较少的计算并满足性能要求。一种近似算法是三时间尺度的actor-critic算法,而另一种是两时间尺度的算法,它在两个单独的阶段中运行。我们所有的算法都使用同时扰动随机逼近(SPSA)方法递归更新随机策略。我们简要介绍了算法的收敛性分析。然后,我们提出了关于半导体晶圆厂生产计划问题的数值实验,在该实验中,我们将所有算法的性能与策略迭代进行了比较。发现基于某些基于Hadamard矩阵的确定性摄动的算法显示出最佳结果。 (c)2006 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号