首页> 中文期刊> 《控制理论与应用:英文版》 >Semi-Markov adaptive critic heuristics with application to airline revenue management

Semi-Markov adaptive critic heuristics with application to airline revenue management

         

摘要

The adaptive critic heuristic has been a popular algorithm in reinforcement learning(RL) and approximate dynamic programming(ADP) alike.It is one of the first RL and ADP algorithms.RL and ADP algorithms are particularly useful for solving Markov decision processes(MDPs) that suffer from the curses of dimensionality and modeling.Many real-world problems,however,tend to be semi-Markov decision processes(SMDPs) in which the time spent in each transition of the underlying Markov chains is itself a random variable.Unfortunately for the average reward case,unlike the discounted reward case,the MDP does not have an easy extension to the SMDP.Examples of SMDPs can be found in the area of supply chain management,maintenance management,and airline revenue management.In this paper,we propose an adaptive critic heuristic for the SMDP under the long-run average reward criterion.We present the convergence analysis of the algorithm which shows that under certain mild conditions,which can be ensured within a simulator,the algorithm converges to an optimal solution with probability 1.We test the algorithm extensively on a problem of airline revenue management in which the manager has to set prices for airline tickets over the booking horizon.The problem has a large scale,suffering from the curse of dimensionality,and hence it is difficult to solve it via classical methods of dynamic programming.Our numerical results are encouraging and show that the algorithm outperforms an existing heuristic used widely in the airline industry.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号