首页> 外文会议>Annual conference on Neural Information Processing Systems >Regret based Robust Solutions for Uncertain Markov Decision Processes
【24h】

Regret based Robust Solutions for Uncertain Markov Decision Processes

机译:基于后悔的不确定马尔可夫决策过程的鲁棒解决方案

获取原文

摘要

In this paper, we seek robust policies for uncertain Markov Decision Processes (MDPs). Most robust optimization approaches for these problems have focussed on the computation of maximin policies which maximize the value corresponding to the worst realization of the uncertainty. Recent work has proposed minimax regret as a suitable alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only. We provide algorithms that employ sampling to improve across multiple dimensions: (a) Handle uncertainties over both transition and reward models; (b) Dependence of model uncertainties across state, action pairs and decision epochs; (c) Scalability and quality bounds. Finally, to demonstrate the empirical effectiveness of our sampling approaches, we provide comparisons against benchmark algorithms on two domains from literature. We also provide a Sample Average Approximation (SAA) analysis to compute a posteriori error bounds.
机译:在本文中,我们为不确定的马尔可夫决策过程(MDP)寻找可靠的策略。针对这些问题的最鲁棒的优化方法集中在最大决策策略的计算上,该策略最大化与不确定性的最坏实现相对应的值。最近的工作提出了minimax后悔,可以将其作为maximin目标的合适替代方案,以进行鲁棒优化。然而,现有的用于处理极大极小后悔的算法仅限于仅对奖励具有不确定性的模型。我们提供采用抽样的算法来改善多个维度:(a)处理过渡模型和奖励模型的不确定性; (b)各个国家,行动对和决策时期之间模型不确定性的依赖性; (c)可扩展性和质量界限。最后,为了证明我们的抽样方法的经验有效性,我们提供了与文献中两个领域的基准算法的比较。我们还提供了样本平均近似(SAA)分析,以计算后验误差范围。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号