A CONVEX PROGRAMMING APPROACH FOR DISCRETE-TIME MARKOV DECISION PROCESSES UNDER THE EXPECTED TOTAL REWARD CRITERION

Dufour Francois; Genadot Alexandre

首页> 外文期刊>SIAM Journal on Control and Optimization >A CONVEX PROGRAMMING APPROACH FOR DISCRETE-TIME MARKOV DECISION PROCESSES UNDER THE EXPECTED TOTAL REWARD CRITERION

【24h】

A CONVEX PROGRAMMING APPROACH FOR DISCRETE-TIME MARKOV DECISION PROCESSES UNDER THE EXPECTED TOTAL REWARD CRITERION

机译：预期总奖励标准下的离散时间马尔可夫决策过程的凸编程方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this work, we study discrete-time Markov decision processes (MDPs) under constraints with Borel state and action spaces and where all the performance functions have the same form of the expected total reward (ETR) criterion over the infinite time horizon. One of our objective is to propose a convex programming formulation for this type of MDP. It will be shown that the values of the constrained control problem and the associated convex program coincide. Moreover, if there exists an optimal solution to the convex program then there exists a stationary randomized policy which is optimal for the MDP. It will be also shown that in the framework of constrained control problems, the supremum of the ETRs over the set of randomized policies is equal to the supremum of the ETRs over the set of stationary randomized policies. We consider standard hypotheses such as the so-called continuity-compactness conditions and a Slater-type condition. Our assumptions are quite weak to deal with cases that have not yet been addressed in the literature. Examples are presented to illustrate our results.

机译：在这项工作中，我们研究了与Borel状态和动作空间的约束下的离散时间马尔可夫决策过程（MDP），并且所有性能功能在无限时间范围内具有相同的预期总奖励（ETR）标准。我们的目标是为此类型的MDP提出凸编程制定。将表明，受限控制问题的值和相关的凸面编程重合。此外，如果存在对凸节目的最佳解决方案，则存在对MDP最佳的静止随机策略。还将表明，在受约束的控制问题的框架中，在随机策略集合上的ETRS的高度等于静止随机策略集中的ETRS的高度。我们考虑标准假设，例如所谓的连续性紧凑性条件和滑动式条件。我们的假设非常薄弱地处理尚未在文献中尚未解决的案件。提出了示例以说明我们的结果。

著录项

来源
《SIAM Journal on Control and Optimization》 |2020年第4期|共32页
作者
Dufour Francois; Genadot Alexandre;
展开▼
作者单位

Univ Bordeaux Inst Math Bordeaux Inst Polytech Bordeaux INRIA Bordeaux Sud Ouest Team CQFD Bordeaux France;

Univ Bordeaux Inst Math Bordeaux Inst Polytech Bordeaux INRIA Bordeaux Sud Ouest Team CQFD Bordeaux France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类运筹学;控制论、信息论（数学理论）;
关键词
Markov decision process; expected total reward criterion; occupation measure; constraints; convex program;

机译：马尔可夫决策过程;预期的总奖励标准;占领措施;约束;凸面;

相似文献

外文文献
中文文献
专利

1. A CONVEX PROGRAMMING APPROACH FOR DISCRETE-TIME MARKOV DECISION PROCESSES UNDER THE EXPECTED TOTAL REWARD CRITERION [J] . Dufour Francois, Genadot Alexandre SIAM Journal on Control and Optimization . 2020,第4期

机译：预期总奖励标准下的离散时间马尔可夫决策过程的凸编程方法
2. Optimally solving Markov decision processes with total expected discounted reward function: Linear programming revisited [J] . Oguzhan Alagoz, Mehmet U.S. Ayvaci, Jeffrey T. Linderoth Computers & Industrial Engineering . 2015,第sepa期

机译：使用总预期折现报酬函数优化求解马尔可夫决策过程：重新考虑线性规划
3. CONSTRAINED MARKOV DECISION PROCESSES WITH EXPECTED TOTAL REWARD CRITERIA [J] . Jaskiewicz Anna, Nowak Andrzej S. SIAM Journal on Control and Optimization . 2019,第5期

机译：受限制的马尔可夫决策过程，预计总奖励标准
4. Benefit analysis for grid-connected photovoltaic system based on markov decision processes with expected total reward criterion [C] . Ying-zi Li, Luan Ru, Jin-cang Niu IEEE Conference on Industrial Electronics and Applications;ICIEA 2009 . 2009

机译：基于带有预期总奖励标准的马尔可夫决策过程的并网光伏系统的效益分析
5. Discrete-time partially observed Markov decision processes: Ergodic, adaptive, and safety control. [D] . Hsu, Shun-Pin. 2002

机译：离散时间部分观察到的马尔可夫决策过程：遍历，自适应和安全控制。
6. Developing a weighted reward criterion for the Markov-based decision of road maintenance [O] . Hui Gao, Xueqing Zhang, Yashuai Li -1

机译：为基于Markov的道路养护决策制定加权奖励标准
7. A Convex Programming Approach for Discrete-Time Markov Decision Processes under the Expected Total Reward Criterion [O] . François Dufour, Alexandre Genadot 2020

机译：在预期总奖励标准下的离散时间马尔可夫决策过程的凸编程方法
8. Discrete-Time Controlled Markov Processes With Average Cost Criterion: A Survey. [R] . Arapostathis, A., Borkar, V. S., Fernandez- Gaucherand, E., 1992

机译：具有平均成本标准的离散时间控制马尔可夫过程：一项调查。

A CONVEX PROGRAMMING APPROACH FOR DISCRETE-TIME MARKOV DECISION PROCESSES UNDER THE EXPECTED TOTAL REWARD CRITERION

摘要

著录项

相似文献

相关主题

期刊订阅