Regret based Robust Solutions for Uncertain Markov Decision Processes

机译：基于后悔的不确定马尔可夫决策过程的鲁棒解决方案

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we seek robust policies for uncertain Markov Decision Processes (MDPs). Most robust optimization approaches for these problems have focussed on the computation of maximin policies which maximize the value corresponding to the worst realization of the uncertainty. Recent work has proposed minimax regret as a suitable alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only. We provide algorithms that employ sampling to improve across multiple dimensions: (a) Handle uncertainties over both transition and reward models; (b) Dependence of model uncertainties across state, action pairs and decision epochs; (c) Scalability and quality bounds. Finally, to demonstrate the empirical effectiveness of our sampling approaches, we provide comparisons against benchmark algorithms on two domains from literature. We also provide a Sample Average Approximation (SAA) analysis to compute a posteriori error bounds.

机译：在本文中，我们为不确定的马尔可夫决策过程（MDP）寻找可靠的策略。针对这些问题的最鲁棒的优化方法集中在最大决策策略的计算上，该策略最大化与不确定性的最坏实现相对应的值。最近的工作提出了minimax后悔，可以将其作为maximin目标的合适替代方案，以进行鲁棒优化。然而，现有的用于处理极大极小后悔的算法仅限于仅对奖励具有不确定性的模型。我们提供采用抽样的算法来改善多个维度：（a）处理过渡模型和奖励模型的不确定性; （b）各个国家，行动对和决策时期之间模型不确定性的依赖性; （c）可扩展性和质量界限。最后，为了证明我们的抽样方法的经验有效性，我们提供了与文献中两个领域的基准算法的比较。我们还提供了样本平均近似（SAA）分析，以计算后验误差范围。

著录项

来源
《Annual conference on Neural Information Processing Systems》|2013年|881-889|共9页
会议地点
作者
Asrar Ahmed; Pradeep Varakantham; Yossiri Adulyasak; Patrick Jaillet;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs) [J] . Asrar Ahmed, Pradeep Varakantham, Meghna Lowalekar, The Journal of Artificial Intelligence Research . 2017,第10期

机译：在不确定的马尔可夫决策过程（MDP）中将后悔最小化的基于采样的方法
2. Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs) [J] . Ahmed Asrar, Varakantham Pradeep, Lowalekar Meghna, The Journal of Artificial Intelligence Research . 2017,第期

机译：基于对不确定的马尔可夫决策过程中遗憾的基于方法（MDPS）
3. Light robustness in the optimization of Markov decision processes with uncertain parameters [J] . Buchholz Peter, Scheftelowitsch Dimitri Computers & operations research . 2019,第AUGa期

机译：参数不确定的Markov决策过程优化中的光鲁棒性
4. Regret based Robust Solutions for Uncertain Markov Decision Processes [C] . Asrar Ahmed, Pradeep Varakantham, Yossiri Adulyasak, Annual conference on Neural Information Processing Systems . 2013

机译：基于不确定的马尔可夫决策过程的遗憾解决方案
5. Regret-based reward elicitation for Markov decision processes. [D] . Kevin, Regan. 2014

机译：基于后悔的马尔可夫决策过程的奖励启发。
6. Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play [O] . Sherif Abdelfattah, Kathryn Kasmarik, Jiankun Hu 2018

机译：通过内在动机的自我博弈在多目标马尔可夫决策过程中发展稳健的政策覆盖范围
7. Parametric regret in uncertain Markov decision processes [O] . Huan Xu, Shie Mannor 2009

机译：不确定马尔可夫决策过程中的参数遗憾

Regret based Robust Solutions for Uncertain Markov Decision Processes

摘要

著录项

相似文献

相关主题

期刊订阅