首页> 外文期刊>Engineering Applications of Artificial Intelligence >A bounded actor-critic reinforcement learning algorithm applied to airline revenue management
【24h】

A bounded actor-critic reinforcement learning algorithm applied to airline revenue management

机译:应用于航空公司收入管理的有界演员批评批评学习算法

获取原文
获取原文并翻译 | 示例
           

摘要

Reinforcement Learning (RL) is an artificial intelligence technique used to solve Markov and semi-Markov decision processes. Actor critics form a major class of RL algorithms that suffer from a critical deficiency, which is that the values of the so-called actor in these algorithms can become very large causing computer overflow. In practice, hence, one has to artificially constrain these values, via a projection, and at times further use temperature-reduction tuning parameters in the popular Boltzmann action-selection schemes to make the algorithm deliver acceptable results. This artificial bounding and temperature reduction, however, do not allow for full exploration of the state space, which often leads to sub-optimal solutions on large-scale problems. We propose a new actor critic algorithm in which (i) the actor's values remain bounded without any projection and (ii) no temperature-reduction tuning parameter is needed. The algorithm also represents a significant improvement over a recent version in the literature, where although the values remain bounded they usually become very large in magnitude, necessitating the use of a temperature-reduction parameter. Our new algorithm is tested on an important problem in an area of management science known as airline revenue management, where the state-space is very large. The algorithm delivers encouraging computational behavior, outperforming a well-known industrial heuristic called EMSR-b on industrial data.
机译:强化学习(RL)是用于解决马尔可夫和半马尔可夫决策过程的人工智能技术。演员批评者形成了一个主要的RL算法,这些缺乏临界缺陷,这是这些算法中所谓的演员的值变得非常大,导致计算机溢出。在实践中,因此,必须通过投影人工地限制这些值,并且有时进一步使用流行的Boltzmann动作选择方案中的温度减少调谐参数来使算法提供可接受的结果。然而,这种人工限制和降温不允许完全探索状态空间,这通常会导致对大规模问题的次优溶液。我们提出了一种新的演员批评算法,其中(i)演员的值保持界限,没有任何投影,并且(ii)不需要温度减少调整参数。该算法还代表了文献中最近版本的显着改进,其中虽然值保持有界限,但它们通常变得非常大,因此需要使用温度降低参数。我们的新算法测试了一个称为航空收入管理的管理科学领域的重要问题,其中国家空间非常大。该算法提供鼓励计算行为,表现出在工业数据上称为EMSR-B的着名工业启发式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号