首页> 外文期刊>Machine Learning >A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis
【24h】

A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis

机译:一种基于策略迭代的平均奖励强化学习算法:收益管理与收敛性分析的实证结果

获取原文
获取原文并翻译 | 示例

摘要

We present a Reinforcement Learning (RL) algorithm based on policy iteration for solving average reward Markov and semi-Markov decision problems. In the literature on discounted reward RL, algorithms based on policy iteration and actor-critic algorithms have appeared. Our algorithm is an asynchronous, model-free algorithm (which can be used on large-scale problems) that hinges on the idea of computing the value function of a given policy and searching over policy space. In the applied operations research community, RL has been used to derive good solutions to problems previously considered intractable. Hence in this paper, we have tested the proposed algorithm on a commercially significant case study related to a real-world problem from the airline industry. It focuses on yield management, which has been hailed as the key factor for generating profits in the airline industry. In the experiments conducted, we use our algorithm with a nearest-neighbor approach to tackle a large state space. We also present a convergence analysis of the algorithm via an ordinary differential equation method.
机译:我们提出一种基于策略迭代的强化学习(RL)算法,用于解决平均奖励马尔可夫和半马尔可夫决策问题。在有关折扣奖励RL的文献中,出现了基于策略迭代的算法和基于行为者的算法。我们的算法是一种异步的,无模型的算法(可用于大规模问题),其核心思想是计算给定策略的价值函数并在策略空间中进行搜索。在应用运筹学界,RL被用来为先前认为棘手的问题提供良好的解决方案。因此,在本文中,我们在与航空业的实际问题相关的具有商业意义的案例研究中测试了该算法。它专注于收益管理,而收益管理被誉为是航空业创造利润的关键因素。在进行的实验中,我们将算法与最近邻方法结合使用来解决较大的状态空间。我们还通过普通的微分方程方法介绍了该算法的收敛性分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号