首页> 外文会议>Annual conference on Neural Information Processing Systems >Regret based Robust Solutions for Uncertain Markov Decision Processes
【24h】

Regret based Robust Solutions for Uncertain Markov Decision Processes

机译:基于不确定的马尔可夫决策过程的遗憾解决方案

获取原文

摘要

In this paper, we seek robust policies for uncertain Markov Decision Processes (MDPs). Most robust optimization approaches for these problems have focussed on the computation of maximin policies which maximize the value corresponding to the worst realization of the uncertainty. Recent work has proposed minimax regret as a suitable alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only. We provide algorithms that employ sampling to improve across multiple dimensions: (a) Handle uncertainties over both transition and reward models; (b) Dependence of model uncertainties across state, action pairs and decision epochs; (c) Scalability and quality bounds. Finally, to demonstrate the empirical effectiveness of our sampling approaches, we provide comparisons against benchmark algorithms on two domains from literature. We also provide a Sample Average Approximation (SAA) analysis to compute a posteriori error bounds.
机译:在本文中,我们为不确定的马尔可夫决策过程(MDP)寻求强大的政策。这些问题的最强大的优化方法侧重于计算最大化与不确定性最糟糕的最严重实现的值的最大化值。最近的工作已经提出了Minimax遗憾,作为最大稳健优化的最大值目标的合适替代方案。然而,用于处理Minimax的现有算法遗憾的是仅限于奖励不确定性的模型。我们提供采用采样以改进多维的算法:(a)处理过渡和奖励模型的不确定性; (b)模型不确定性跨国,行动对和决策时期的依赖性; (c)可扩展性和质量范围。最后,为了证明我们的采样方法的实证效果,我们提供了对来自文学的两个域的基准算法的比较。我们还提供样本平均近似(SAA)分析以计算后验误差界限。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号