...
首页> 外文期刊>Mathematics of operations research >Reinforcement Learning in Robust Markov Decision Processes
【24h】

Reinforcement Learning in Robust Markov Decision Processes

机译:鲁棒马尔可夫决策过程中的强化学习

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

An important challenge in Markov decision processes (MDP) is to ensure robustness with respect to unexpected or adversarial system behavior. A standard paradigm to tackle this challenge is the robust MDP framework that models the parameters as arbitrary elements of pre-defined "uncertainty sets," and seeks the minimax policy-the policy that performs the best under the worst realization of the parameters in the uncertainty set. A crucial issue of the robust MDP framework, largely unaddressed in literature, is how to find appropriate description of the uncertainty in a principled data-driven way. In this paper we address this problem using an online learning approach: we devise an algorithm that, without knowing the true uncertainty model, is able to adapt its level of protection to uncertainty, and in the long run performs as well as the minimax policy as if the true uncertainty model is known. Indeed, the algorithm achieves similar regret bounds as standard MDP where no parameter is adversarial, which shows that with virtually no extra cost we can adapt robust learning to handle uncertainty in MDPs. To the best of our knowledge, this is the first attempt to learn uncertainty in robust MDPs.
机译:马尔可夫决策过程(MDP)的一项重要挑战是确保针对意外或对抗性系统行为的鲁棒性。解决此挑战的标准范例是健壮的MDP框架,该框架将参数建模为预定义“不确定性集”的任意元素,并寻求极小极大策略-在不确定性中参数实现最差的情况下表现最佳的策略组。健壮的MDP框架的一个关键问题(文献中基本上没有解决)是如何以有原则的数据驱动方式找到对不确定性的适当描述。在本文中,我们使用在线学习方法解决了这个问题:我们设计了一种算法,无需知道真正的不确定性模型,便能够使其保护级别适应不确定性,并且从长远来看,其性能与minimax策略相同。如果知道真正的不确定性模型。实际上,该算法达到了与标准MDP相似的遗憾界限,在标准MDP中,没有参数是对抗性的,这表明几乎不需要额外的费用,我们就可以采用鲁棒的学习方法来处理MDP中的不确定性。据我们所知,这是学习可靠的MDP中不确定性的首次尝试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号