Kernel-Based Reinforcement Learning in Robust Markov Decision Processes

机译：基于内核的强制性高潮策略决策过程

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The robust Markov Decision Process (MDP) framework aims to address the problem of parameter uncertainty due to model mismatch, approximation errors or even adversarial behaviors. It is especially relevant when deploying the learned policies in real-world applications. Scaling up the robust MDP framework to large or continuous state space remains a challenging problem. The use of function approximation in this case is usually inevitable and this can only amplify the problem of model mismatch and parameter uncertainties. It has been previously shown that, in the case of MDPs with state aggregation, the robust policies enjoy a tighter performance bound compared to standard solutions due to its reduced sensitivity to approximation errors. We extend these results to the much larger class of kernel-based approximators and show, both analytically and empirically that the robust policies can significantly outperform the non-robust counterpart.

机译：强大的马尔可夫决策过程（MDP）框架旨在解决由于模型不匹配，近似误差甚至逆势行为而导致参数不确定性的问题。在现实世界应用中部署学习策略时，尤其相关。将强大的MDP框架扩展到大或连续状态空间仍然是一个具有挑战性的问题。在这种情况下，使用功能近似通常是不可避免的，这只能放大模型不匹配和参数不确定性的问题。先前已经表明，在具有状态聚合的MDP的情况下，由于其对近似误差的灵敏度降低，强大的策略与标准解决方案相比享有更紧密的性能。我们将这些结果扩展到基于更大类的基于内核的近似器和显示，并经验地显示，强大的政策可以显着优于非强大的对应物。

著录项

来源
《International Conference on Machine Learning》|2019年|6368-7043p|共9页
会议地点
作者
Shiau Hong Lim; Arnaud Autef;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP181-53;
关键词

相似文献

外文文献
中文文献
专利

1. Reinforcement Learning in Robust Markov Decision Processes [J] . Lim Shiau Hong, Xu Huan, Mannor Shie Mathematics of operations research . 2016,第4期

机译：鲁棒马尔可夫决策过程中的强化学习
2. Optimising darts strategy using Markov decision processes and reinforcement learning [J] . Graham Baird Journal of the Operational Research Society . 2020,第6期

机译：利用马尔可夫决策过程和加强学习优化飞镖策略
3. Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes [J] . Nathan Kallus, Masatoshi Uehara Journal of machine learning research . 2020,第a期

机译：马尔可夫决策过程有效截止政策评估的双重加固学习
4. Kernel-Based Reinforcement Learning in Robust Markov Decision Processes [C] . Shiau Hong Lim, Arnaud Autef International Conference on Machine Learning . 2019

机译：基于内核的强制性高潮策略决策过程
5. A New Reinforcement Learning Algorithm with Fixed Exploration for Semi-Markov Decision Processes [D] . Encapera, Angelo Michael. 2017

机译：半马尔可夫决策过程的固定探索新强化学习算法
6. Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes [O] . Taylor Killian, Samuel Daulton, George Konidaris, -1

机译：隐马尔可夫决策过程的鲁棒高效转移学习
7. Reinforcement Learning in Robust Markov Decision Processes [O] . Shiau Hong Lim, Huan Xu, Shie Mannor 2016

机译：强化马尔可夫决策过程中的加固学习

Kernel-Based Reinforcement Learning in Robust Markov Decision Processes

摘要

著录项

相似文献

相关主题

期刊订阅