Regret based Robust Solutions for Uncertain Markov Decision Processes

机译：基于不确定的马尔可夫决策过程的遗憾解决方案

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we seek robust policies for uncertain Markov Decision Processes (MDPs). Most robust optimization approaches for these problems have focussed on the computation of maximin policies which maximize the value corresponding to the worst realization of the uncertainty. Recent work has proposed minimax regret as a suitable alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only. We provide algorithms that employ sampling to improve across multiple dimensions: (a) Handle uncertainties over both transition and reward models; (b) Dependence of model uncertainties across state, action pairs and decision epochs; (c) Scalability and quality bounds. Finally, to demonstrate the empirical effectiveness of our sampling approaches, we provide comparisons against benchmark algorithms on two domains from literature. We also provide a Sample Average Approximation (SAA) analysis to compute a posteriori error bounds.

机译：在本文中，我们为不确定的马尔可夫决策过程（MDP）寻求强大的政策。这些问题的最强大的优化方法侧重于计算最大化与不确定性最糟糕的最严重实现的值的最大化值。最近的工作已经提出了Minimax遗憾，作为最大稳健优化的最大值目标的合适替代方案。然而，用于处理Minimax的现有算法遗憾的是仅限于奖励不确定性的模型。我们提供采用采样以改进多维的算法：（a）处理过渡和奖励模型的不确定性; （b）模型不确定性跨国，行动对和决策时期的依赖性; （c）可扩展性和质量范围。最后，为了证明我们的采样方法的实证效果，我们提供了对来自文学的两个域的基准算法的比较。我们还提供样本平均近似（SAA）分析以计算后验误差界限。

著录项

来源
《Annual conference on Neural Information Processing Systems》|2013年||共9页
会议地点
作者
Asrar Ahmed; Pradeep Varakantham; Yossiri Adulyasak; Patrick Jaillet;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词

相似文献

外文文献
中文文献
专利

1. Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs) [J] . Asrar Ahmed, Pradeep Varakantham, Meghna Lowalekar, The Journal of Artificial Intelligence Research . 2017,第10期

机译：在不确定的马尔可夫决策过程（MDP）中将后悔最小化的基于采样的方法
2. Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs) [J] . Ahmed Asrar, Varakantham Pradeep, Lowalekar Meghna, The Journal of Artificial Intelligence Research . 2017,第期

机译：基于对不确定的马尔可夫决策过程中遗憾的基于方法（MDPS）
3. Light robustness in the optimization of Markov decision processes with uncertain parameters [J] . Buchholz Peter, Scheftelowitsch Dimitri Computers & operations research . 2019,第AUGa期

机译：参数不确定的Markov决策过程优化中的光鲁棒性
4. Regret based Robust Solutions for Uncertain Markov Decision Processes [C] . Asrar Ahmed, Pradeep Varakantham, Yossiri Adulyasak, Annual conference on Neural Information Processing Systems . 2013

机译：基于后悔的不确定马尔可夫决策过程的鲁棒解决方案
5. Regret-based reward elicitation for Markov decision processes. [D] . Kevin, Regan. 2014

机译：基于后悔的马尔可夫决策过程的奖励启发。
6. Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play [O] . Sherif Abdelfattah, Kathryn Kasmarik, Jiankun Hu 2018

机译：通过内在动机的自我博弈在多目标马尔可夫决策过程中发展稳健的政策覆盖范围
7. Parametric regret in uncertain Markov decision processes [O] . Huan Xu, Shie Mannor 2009

机译：不确定马尔可夫决策过程中的参数遗憾

Regret based Robust Solutions for Uncertain Markov Decision Processes

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅