Reinforcement Learning in Robust Markov Decision Processes

Lim Shiau Hong; Xu Huan; Mannor Shie

首页> 外文期刊>Mathematics of operations research >Reinforcement Learning in Robust Markov Decision Processes

【24h】

Reinforcement Learning in Robust Markov Decision Processes

机译：鲁棒马尔可夫决策过程中的强化学习

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

An important challenge in Markov decision processes (MDP) is to ensure robustness with respect to unexpected or adversarial system behavior. A standard paradigm to tackle this challenge is the robust MDP framework that models the parameters as arbitrary elements of pre-defined "uncertainty sets," and seeks the minimax policy-the policy that performs the best under the worst realization of the parameters in the uncertainty set. A crucial issue of the robust MDP framework, largely unaddressed in literature, is how to find appropriate description of the uncertainty in a principled data-driven way. In this paper we address this problem using an online learning approach: we devise an algorithm that, without knowing the true uncertainty model, is able to adapt its level of protection to uncertainty, and in the long run performs as well as the minimax policy as if the true uncertainty model is known. Indeed, the algorithm achieves similar regret bounds as standard MDP where no parameter is adversarial, which shows that with virtually no extra cost we can adapt robust learning to handle uncertainty in MDPs. To the best of our knowledge, this is the first attempt to learn uncertainty in robust MDPs.

机译：马尔可夫决策过程（MDP）的一项重要挑战是确保针对意外或对抗性系统行为的鲁棒性。解决此挑战的标准范例是健壮的MDP框架，该框架将参数建模为预定义“不确定性集”的任意元素，并寻求极小极大策略-在不确定性中参数实现最差的情况下表现最佳的策略组。健壮的MDP框架的一个关键问题（文献中基本上没有解决）是如何以有原则的数据驱动方式找到对不确定性的适当描述。在本文中，我们使用在线学习方法解决了这个问题：我们设计了一种算法，无需知道真正的不确定性模型，便能够使其保护级别适应不确定性，并且从长远来看，其性能与minimax策略相同。如果知道真正的不确定性模型。实际上，该算法达到了与标准MDP相似的遗憾界限，在标准MDP中，没有参数是对抗性的，这表明几乎不需要额外的费用，我们就可以采用鲁棒的学习方法来处理MDP中的不确定性。据我们所知，这是学习可靠的MDP中不确定性的首次尝试。

著录项

来源
《Mathematics of operations research》 |2016年第4期|共29页
作者
Lim Shiau Hong; Xu Huan; Mannor Shie;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类运筹学;
关键词
robust MDP; reinforcement learning;

机译：强大的MDP;强化学习;

相似文献

外文文献
中文文献
专利

1. Reinforcement Learning in Robust Markov Decision Processes [J] . Lim Shiau Hong, Xu Huan, Mannor Shie Mathematics of operations research . 2016,第4期

机译：鲁棒马尔可夫决策过程中的强化学习
2. Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes [J] . Nathan Kallus, Masatoshi Uehara Journal of machine learning research . 2020,第a期

机译：马尔可夫决策过程有效截止政策评估的双重加固学习
3. Variance-penalized Markov decision processes: dynamic programming and reinforcement learning techniques [J] . Abhijit Gosavi International journal of general systems . 2014,第5a6期

机译：方差惩罚的马尔可夫决策过程：动态规划和强化学习技术
4. Kernel-Based Reinforcement Learning in Robust Markov Decision Processes [C] . Shiau Hong Lim, Arnaud Autef International Conference on Machine Learning . 2019

机译：基于内核的强制性高潮策略决策过程
5. A New Reinforcement Learning Algorithm with Fixed Exploration for Semi-Markov Decision Processes [D] . Encapera, Angelo Michael. 2017

机译：半马尔可夫决策过程的固定探索新强化学习算法
6. Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes [O] . Taylor Killian, Samuel Daulton, George Konidaris, -1

机译：隐马尔可夫决策过程的鲁棒高效转移学习
7. Reinforcement Learning in Robust Markov Decision Processes [O] . Shiau Hong Lim, Huan Xu, Shie Mannor 2016

机译：强化马尔可夫决策过程中的加固学习

Reinforcement Learning in Robust Markov Decision Processes

摘要

著录项

相似文献

相关主题

期刊订阅