...
首页> 外文期刊>Operations Research: The Journal of the Operations Research Society of America >Decomposable Markov Decision Processes: A Fluid Optimization Approach
【24h】

Decomposable Markov Decision Processes: A Fluid Optimization Approach

机译:可分解的马尔可夫决策过程:一种流体优化方法

获取原文
获取原文并翻译 | 示例
           

摘要

Decomposable Markov decision processes (MDPs) are problems where the stochastic system can be decomposed into multiple individual components. Although such MDPs arise naturally in many practical applications, they are often difficult to solve exactly due to the enormous size of the state space of the complete system, which grows exponentially with the number of components. In this paper, we propose an approximate solution approach to decomposable MDPs that is based on re-solving a fluid linear optimization formulation of the problem at each decision epoch. This formulation tractably approximates the problem by modeling transition behavior at the level of the individual components rather than the complete system. We prove that our fluid formulation provides a tighter bound on the optimal value function than three state-of-the-art formulations: the approximate linear optimization formulation, the classical Lagrangian relaxation formulation, and a novel, alternate Lagrangian relaxation that is based on relaxing an action consistency constraint. We provide a numerical demonstration of the effectiveness of the approach in the area of multiarmed bandit problems, where we show that our approach provides near optimal performance and outperforms state-of-the-art algorithms.
机译:可分解的马尔可夫决策过程(MDP)是随机系统可以分解为多个单独组件的问题。尽管这样的MDP在许多实际应用中自然而然地出现,但是由于完整系统状态空间的巨大规模(随组件数量成倍增长),它们通常难以准确解决。在本文中,我们提出了一种可分解​​MDP的近似解决方法,该方法基于在每个决策时期重新求解问题的流体线性优化公式。通过在单个组件而不是整个系统的级别上对过渡行为进行建模,该表述可以轻松地近似解决该问题。我们证明,与三种最新的配方相比,我们的流体配方在最优值函数上提供了更紧密的界限:近似线性优化配方,经典的拉格朗日松弛配方以及基于松弛的新颖的替代拉格朗日松弛动作一致性约束。我们提供了该方法在多臂匪徒问题领域中的有效性的数值演示,我们在其中证明了我们的方法可提供近乎最佳的性能,并且性能优于最新的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号