首页> 外文期刊>Mathematics of operations research >Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time
【24h】

Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time

机译:随机线性编程在几乎线性(有时载位)时间内解决了马尔可夫决策问题

获取原文
获取原文并翻译 | 示例
           

摘要

We propose a novel randomized linear programming algorithm for approximating the optimal policy of the discounted-reward and average-reward Markov decision problems. By leveraging the value-policy duality, the algorithm adaptively samples state-action-state transitions and makes exponentiated primal-dual updates. We show that it finds an f-optimal policy using nearly linear runtime in the worst case for a fixed value of the discount factor. When the Markov decision process is ergodic and specified in some special data formats, for fixed values of certain ergodicity parameters, the algorithm finds an c-optimal policy using sample size and time linear in the total number of state-action pairs, which is sublinear in the input size. These results provide a new venue and complexity benchmarks for solving stochastic dynamic programs.
机译:我们提出了一种新颖的随机线性规划算法,用于近似折扣奖励和平均奖励马尔可夫决策问题的最佳政策。 通过利用值 - 政策二元性,算法自适应地采样状态 - 动作状态转换并进行指数化的原始 - 双重更新。 我们表明它在最坏情况下使用几乎线性的运行时找到了F-Optimal策略,以获得折扣系数的固定值。 当马尔可夫决策过程是ergodic并且以某种特殊数据格式指定时,对于某些ergodicity参数的固定值,算法在状态 - 动作对的总数中使用样本大小和时间线性找到C-Optimal策略,这是Sublinear的 在输入大小。 这些结果为解决随机动态程序提供了新的场地和复杂性基准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号