Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)

Ahmed Asrar; Varakantham Pradeep; Lowalekar Meghna; Adulyasak Yossiri; Jaillet Patrick

首页> 外文期刊>The Journal of Artificial Intelligence Research >Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)

【24h】

Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)

机译：基于对不确定的马尔可夫决策过程中遗憾的基于方法（MDPS）

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Markov Decision Processes (MDPs) are an effective model to represent decision processes in the presence of transitional uncertainty and reward tradeoffs. However, due to the difficulty in exactly specifying the transition and reward functions in MDPs, researchers have proposed uncertain MDP models and robustness objectives in solving those models. Most approaches for computing robust policies have focused on the computation of maximin policies which maximize the value in the worst case amongst all realisations of uncertainty. Given the overly conservative nature of maximin policies, recent work has proposed minimax regret as an ideal alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only and they are also limited in their scalability. Therefore, we provide a general model of uncertain MDPs that considers uncertainty over both transition and reward functions. Furthermore, we also consider dependence of the uncertainty across different states and decision epochs. We also provide a mixed integer linear program formulation for minimizing regret given a set of samples of the transition and reward functions in the uncertain MDP. In addition, we provide two myopic variants of regret, namely Cumulative Expected Myopic Regret (CEMR) and One Step Regret (OSR) that can be optimized in a scalable manner. Specifically, we provide dynamic programming and policy iteration based algorithms to optimize CEMR and OSR respectively. Finally, to demonstrate the effectiveness of our approaches, we provide comparisons on two benchmark problems from literature. We observe that optimizing the myopic variants of regret, OSR and CEMR are better than directly optimizing the regret.

机译：马尔可夫决策过程（MDP）是一种有效的模型，可以在过渡不确定性和奖励权衡存在下代表决策过程。但是，由于难以准确指定MDP中的转换和奖励功能，研究人员提出了解决这些模型的不确定MDP模型和鲁棒性目标。大多数计算强劲政策的方法都集中在最大化最坏情况下最大化的最大案例的最大值的计算。鉴于Maximin政策的过于保守性质，最近的工作已经提出了Minimax遗憾，作为最大稳健优化的最大目标的理想选择。然而，用于处理MIMIMAX的现有算法遗憾的是仅限于奖励的不确定性的模型，它们也受到可扩展性的限制。因此，我们提供了一个不确定的MDP的一般模型，其考虑了两种过渡和奖励函数的不确定性。此外，我们还考虑对不同国家和决定时期的不确定性的依赖。我们还提供混合整数线性程序配方，以最小化遗憾给出了一组不确定MDP中的过渡和奖励功能的样本。此外，我们提供了两种遗憾的遗憾，即累积预期的近视遗憾（CEMR）和一个步骤遗憾（OSR）可以以可扩展的方式进行优化。具体地，我们提供了基于动态的编程和基于策略迭代的算法来优化CEMR和OSR。最后，为了证明我们方法的有效性，我们提供了来自文学的两个基准问题的比较。我们观察到，优化遗憾的遗传变体，OSR和CEMR优于直接优化遗憾。

著录项

来源
《The Journal of Artificial Intelligence Research》 |2017年第2017期|共36页
作者
Ahmed Asrar; Varakantham Pradeep; Lowalekar Meghna; Adulyasak Yossiri; Jaillet Patrick;
展开▼
作者单位

London Business Sch London England;

Singapore Management Univ Sch Informat Syst Singapore Singapore;

Singapore Management Univ Sch Informat Syst Singapore Singapore;

HEC Montreal &

GERAD Dept Logist &

Operat Management Montreal PQ Canada;

MIT Dept Elect Engn &

Comp Sci Cambridge MA 02139 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs) [J] . Asrar Ahmed, Pradeep Varakantham, Meghna Lowalekar, The Journal of Artificial Intelligence Research . 2017,第10期

机译：在不确定的马尔可夫决策过程（MDP）中将后悔最小化的基于采样的方法
2. The empirical Bayes envelope and regret minimization in competitive Markov decision processes [J] . Mannor S., Shimkin N. Mathematics of operations research . 2003,第2期

机译：在竞争性马尔可夫决策过程中，经验贝叶斯包络和后悔最小化
3. Variance minimization for continuous-time Markov decision processes： two approaches [J] . ZHU Quan-xin 高校应用数学学报：英文版 . 2010,第004期

机译：连续时间马尔可夫决策过程的方差最小化：两种方法
4. Regret based Robust Solutions for Uncertain Markov Decision Processes [C] . Asrar Ahmed, Pradeep Varakantham, Yossiri Adulyasak, Annual conference on Neural Information Processing Systems . 2013

机译：基于后悔的不确定马尔可夫决策过程的鲁棒解决方案
5. Regret-based reward elicitation for Markov decision processes. [D] . Kevin, Regan. 2014

机译：基于后悔的马尔可夫决策过程的奖励启发。
6. Sampling-based Bayesian approaches reveal the importance of quasi-bistable behavior in cellular decision processes on the example of the MAPK signaling pathway in PC-12 cell lines [O] . Antje Jensch, Caterina Thomaseth, Nicole E. Radde 2017

机译：基于采样的贝叶斯方法以PC-12细胞系中MAPK信号通路为例揭示了准双稳态行为在细胞决策过程中的重要性
7. Parametric regret in uncertain Markov decision processes [O] . Huan Xu, Shie Mannor 2009

机译：不确定马尔可夫决策过程中的参数遗憾

Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)

摘要

著录项

相似文献

相关主题

期刊订阅