首页> 中文期刊> 《自动化学报》 >A Simulation Optimization Algorithm for CTMDPs Based on Randomized Stationary Policies1)

A Simulation Optimization Algorithm for CTMDPs Based on Randomized Stationary Policies1)

         

摘要

cqvip:Based on the theory of Markov performance potentials and neuro-dynamic programming(NDP) methodology, we study simulation optimization algorithm for a class of continuous timeMarkov decision processes (CTMDPs) under randomized stationary policies. The proposed algo-rithm will estimate the gradient of average cost performance measure with respect to policy param-eters by transforming a continuous time Markov process into a uniform Markov chain and simula-ting a single sample path of the chain. The goal is to look for a suboptimal randomized stationarypolicy. The algorithm derived here can meet the needs of performance optimization of many diffi-cult systems with large-scale state space. Finally, a numerical example for a controlled Markovprocess is provided.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号