Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time

首页> 外文期刊>Mathematics of operations research >Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time

【24h】

Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time

机译：随机线性编程在几乎线性（有时载位）时间内解决了马尔可夫决策问题

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a novel randomized linear programming algorithm for approximating the optimal policy of the discounted-reward and average-reward Markov decision problems. By leveraging the value-policy duality, the algorithm adaptively samples state-action-state transitions and makes exponentiated primal-dual updates. We show that it finds an f-optimal policy using nearly linear runtime in the worst case for a fixed value of the discount factor. When the Markov decision process is ergodic and specified in some special data formats, for fixed values of certain ergodicity parameters, the algorithm finds an c-optimal policy using sample size and time linear in the total number of state-action pairs, which is sublinear in the input size. These results provide a new venue and complexity benchmarks for solving stochastic dynamic programs.

机译：我们提出了一种新颖的随机线性规划算法，用于近似折扣奖励和平均奖励马尔可夫决策问题的最佳政策。通过利用值 - 政策二元性，算法自适应地采样状态 - 动作状态转换并进行指数化的原始 - 双重更新。我们表明它在最坏情况下使用几乎线性的运行时找到了F-Optimal策略，以获得折扣系数的固定值。当马尔可夫决策过程是ergodic并且以某种特殊数据格式指定时，对于某些ergodicity参数的固定值，算法在状态 - 动作对的总数中使用样本大小和时间线性找到C-Optimal策略，这是Sublinear的在输入大小。这些结果为解决随机动态程序提供了新的场地和复杂性基准。

著录项

来源
《Mathematics of operations research》 |2020年第2期|共30页
作者

展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类运筹学;
关键词
Markov decision process; randomized algorithm; linear programming; duality; primal-dual method; runtime complexity; stochastic approximation;

机译：马尔可夫决策过程;随机算法;线性规划;二元性;原始 - 双方法;运行时复杂性;随机近似;

相似文献

外文文献
中文文献
专利

1. Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time [J] . Mathematics of operations research . 2020,第2期

机译：随机线性编程在几乎线性（有时载位）时间内解决了马尔可夫决策问题
2. Optimally solving Markov decision processes with total expected discounted reward function: Linear programming revisited [J] . Oguzhan Alagoz, Mehmet U.S. Ayvaci, Jeffrey T. Linderoth Computers & Industrial Engineering . 2015,第sepa期

机译：使用总预期折现报酬函数优化求解马尔可夫决策过程：重新考虑线性规划
3. On Solving Linear Systems in Sublinear Time [J] . Alexandr Andoni, Robert Krauthgamer, Yosef Pogrow LIPIcs : Leibniz International Proceedings in Informatics . 2018,第28期

机译：关于亚线性时间解线性系统
4. Linear Programming solvers for Markov Decision Processes [C] . Diego Bello, German Riano IEEE Systems and Information Engineering Design Symposium . 2006

机译：马尔可夫决策过程的线性编程求解器
5. Polynomial-time Random Oracles, Nondeterministic Sublinear Time, and Boolean Function Complexity [D] . Sekoni, Adewale. 2018

机译：多项式时间随机Oracle，不确定的亚线性时间和布尔函数复杂度
6. Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task [O] . Ken Kinjo, Eiji Uchibe, Kenji Doya 2013

机译：动态模型学习在移动机器人导航任务中线性可解马尔可夫决策过程的评估
7. Linear Programming solvers for Markov Decision Processes [O] . 2008

机译：马尔可夫决策过程的线性规划求解器

Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time

摘要

著录项

相似文献

相关主题

期刊订阅