A unified approach for semi-Markov decision processes with discounted and average reward criteria

机译：具有折扣和平均奖励标准的半马尔可夫决策过程的统一方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

On the basis of the sensitivity-based optimization, we develop a unified optimization approach for semi-Markov decision processes (SMDPs) with infinite horizon discounted and average reward criteria. We show that the sensitivity formula under average reward criteria is a limitation case of discounted reward criteria. On the basis of the performance sensitivity formulas, we provide a unified formulation for the policy iteration algorithms of semi-Markov decision processes with discounted and average reward criteria.

机译：在基于灵敏度的优化的基础上，我们针对具有无限期折扣和平均奖励标准的半马尔可夫决策过程（SMDP）开发了统一的优化方法。我们表明，平均奖励标准下的敏感性公式是折现奖励标准的一个局限情况。基于性能敏感性公式，我们为具有折扣和平均奖励标准的半马尔可夫决策过程的策略迭代算法提供了统一的表述。

著录项

来源
《World Congress on Intelligent Control and Automation》|2014年|1741-1744|共4页
会议地点
作者
Yanjie Li; Huijing Wang; Haoyao Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
SMDPs; performance difference; policy iteration;

机译：SMDP;性能差异;策略迭代;

相似文献

外文文献
中文文献
专利

1. A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases [J] . Cao XR, Guo XP Automatica . 2004,第10期

机译：具有折价和平均标准的统一的马尔可夫决策问题和性能敏感性分析方法：多链案例
2. Performance Optimization Of Semi-markov Decision Processes With Discounted-cost Criteria [J] . Baoqun Yin, Yanjie Li, Yaping Zhou, European Journal of Control . 2008,第3期

机译：具有折扣成本准则的半马尔可夫决策过程的性能优化
3. Semi-Markov decision processes with limiting ratio average rewards [J] . Sinha Sagnik, Mondal Prasenjit Journal of Mathematical Analysis and Applications . 2017,第1期

机译：半马尔可夫决策过程，限制比率奖励
4. A unified approach for semi-Markov decision processes with discounted and average reward criteria [C] . Yanjie Li, Huijing Wang, Haoyao Chen World Congress on Intelligent Control and Automation . 2014

机译：具有折扣和平均奖励标准的半马尔可夫决策过程的统一方法
5. A New Reinforcement Learning Algorithm with Fixed Exploration for Semi-Markov Decision Processes [D] . Encapera, Angelo Michael. 2017

机译：半马尔可夫决策过程的固定探索新强化学习算法
6. Learning to maximize reward rate: a model based on semi-Markov decision processes [O] . Arash Khodadadi, Pegah Fakhari, Jerome R. Busemeyer 2014

机译：学习最大化奖励率：基于半马尔可夫决策过程的模型
7. Markov Decision Processes: Discounted Expected Reward or Average Expected Reward? [O] . White D.J. 1993

机译：马尔可夫决策过程：折现的预期奖励还是平均预期奖励？
8. Countable State Discounted Markovian Decision Processes with Unbounded Rewards [R] . Harrison, J. M. 1970

机译：具有无限奖励的可数州折现马尔可夫决策过程

A unified approach for semi-Markov decision processes with discounted and average reward criteria

摘要

著录项

相似文献

相关主题

期刊订阅