Approximate Dynamic Programming with (min; +) linear function approximation for Markov decision processes

机译：马尔可夫决策过程的（最小; +）线性函数近似的近似动态规划

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Markov Decision Process (MDP) is a useful framework to study problems of optimal sequential decision making under uncertainty. Given any MDP the aim here is to find the optimal action selection mechanism i.e., the optimal policy. Typically, the optimal policy (u*) is obtained by substituting the optimal value-function (J*) in the Bellman equation. Alternatively, u* is also obtained by learning the optimal state-action value function Q* known as the Q value-function. However, it is difficult to compute the exact values of J* or Q* for MDPs with large number of states. Approximate Dynamic Programming (ADP) methods address this difficulty by computing lower dimensional approximations of J*/Q*. Most ADP methods employ linear function approximation (LFA), i.e., the approximate solution lies in a subspace spanned by a family of pre-selected basis functions. The approximation is obtained via a linear least squares projection of higher dimensional quantities and the L norm plays an important role in convergence and error analysis. In this paper, we discuss ADP methods for MDPs based on LFAs in the (min, +) algebra. Here the approximate solution is a (min, +) linear combination of a set of basis functions whose span constitutes a subsemimodule. Approximation is obtained via a projection operator onto the subsemimodule which is different from linear least squares projection used in ADP methods based on conventional LFAs. MDPs are not (min, +) linear systems, nevertheless, we show that the monotonicity property of the projection operator helps us establish the convergence of our ADP schemes. We also discuss future directions in ADP methods for MDPs based on the (min, +) LFAs.

机译：马尔可夫决策过程（MDP）是研究不确定条件下最优顺序决策问题的有用框架。给定任何MDP，此处的目的是找到最佳动作选择机制，即最佳策略。通常，通过将最佳值函数（J *）替换为Bellman方程来获得最佳策略（u *）。可替代地，还通过学习被称为Q值函数的最佳状态作用值函数Q *来获得u *。但是，很难为状态数量众多的MDP计算J *或Q *的确切值。近似动态编程（ADP）方法通过计算J * / Q *的低维近似值解决了这一难题。大多数ADP方法采用线性函数逼近（LFA），即，近似解位于由一系列预选基函数所覆盖的子空间中。通过较高维数量的线性最小二乘投影获得近似值，L范数在收敛和误差分析中起重要作用。在本文中，我们讨论了在（min，+）代数中基于LFA的MDP的ADP方法。在这里，近似解是一组基础函数的（最小，+）线性组合，这些基础函数的跨度构成一个半模块。通过投影算子可以在子半模块上获得近似值，该近似值不同于基于传统LFA的ADP方法中使用的线性最小二乘投影。 MDP不是（min，+）线性系统，但是，我们证明了投影算子的单调性有助于我们建立ADP方案的收敛性。我们还将讨论基于（min，+）LFA的MDP的ADP方法的未来发展方向。

著录项

来源
《IEEE Annual Conference on Decision and Control》|2014年|1588-1593|共6页
会议地点
作者
Chandrashekar L.; Bhatnagar Shalabh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Markov processes; approximation theory; computational complexity; decision making; dynamic programming; least squares approximations; ADP methods; Bellman equation; LFA; MDP; Markov decision processes; approximate dynamic programming; linear function approximation; linear least squares projection; optimal action selection mechanism; optimal policy; optimal sequential decision making; optimal state-action value function; optimal value function; Convergence; Equations; Function approximation; Least squares approximations; Markov processes; Vectors;

机译：马尔可夫过程;逼近理论;计算复杂度;决策制定;动态规划;最小二乘逼近; ADP方法; Bellman方程; LFA; MDP;马尔可夫决策过程;近似动态规划;线性函数逼近;线性最小二乘投影;最优动作选择机制;最优策略;最优顺序决策;最优状态作用值函数;最优值函数;收敛性;方程;函数逼近;最小二乘逼近;马尔可夫过程;向量;

相似文献

外文文献
中文文献
专利

1. Finite linear programming approximations of constrained discounted markov decision processes [J] . Dufour F., Prieto-Rumeau T. SIAM Journal on Control and Optimization . 2013,第2期

机译：约束折扣马尔可夫决策过程的有限线性规划近似
2. New approximate dynamic programming algorithms for large-scale undiscounted Markov decision processes and their application to optimize a production and distribution system [J] . Ohno Katsuhisa, Boh Toshitaka, Nakade Koichi, European Journal of Operational Research . 2016,第1期

机译：用于大规模无折扣马尔可夫决策过程的新的近似动态规划算法及其在优化生产和分销系统中的应用
3. Optimally solving Markov decision processes with total expected discounted reward function: Linear programming revisited [J] . Oguzhan Alagoz, Mehmet U.S. Ayvaci, Jeffrey T. Linderoth Computers & Industrial Engineering . 2015,第sepa期

机译：使用总预期折现报酬函数优化求解马尔可夫决策过程：重新考虑线性规划
4. Linear Program Approximations for Factored Continuous-State Markov Decision Processes [C] . Milos Hauskrecht, Branislav Kveton Annual Conference on Neural Information Processing Systems . 2004

机译：因子连续州马尔可夫决策过程的线性程序近似
5. Multistage decisions and risk in Markov decision processes: Towards effective approximate dynamic programming architectures. [D] . Pratikakis, Nikolaos E. 2009

机译：马尔可夫决策过程中的多阶段决策和风险：建立有效的近似动态编程体系结构。
6. Composition of Web Services Using Markov Decision Processes and Dynamic Programming [O] . Víctor Uc-Cetina, Francisco Moo-Mena, Rafael Hernandez-Ucan 2015

机译：使用Markov决策过程和动态规划的Web服务组合
7. Approximate dynamic programming with $(\min,+)$ linear function approximation for Markov decision processes [O] . Lakshminarayanan, Chandrashekar, Bhatnagar, Shalabh 2014

机译：使用$（\ min，+）$线性函数进行近似动态编程马尔可夫决策过程的近似

Approximate Dynamic Programming with (min; +) linear function approximation for Markov decision processes

摘要

著录项

相似文献

相关主题

期刊订阅