Based on the feature analysis of a class of widely existed event-driven Markov decision processes with average cost, this paper presents a simple reinforcement learning algorithm. Rather than expanding events to system states, the algorithm only learns the value functions of original system states, which decreases the burden of computation and data storage. And it is used to solve the admission control problem of M/M/1 queueing system. Computer simulation results show that the new algorithm outperforms the ordinary reinforcement learning algorithm and dynamic programming algorithm. Finaly the efficiency of the algorithm is proved.%对广泛存在的一类事件驱动的平均费用型马尔可夫决策问题,通过分析其模型特征,研究了一种简单的增强型学习算法,不必将事件扩充为系统状态,而只对原始状态的值函数进行学习,减少了计算量和数据存储量。将算法应用于M/M/1排队系统的接纳控制问题,计算机仿真结果表明,算法优于通常的增强型学习和动态规划方法,验证了算法的有效性。
展开▼