首页> 外文会议>International Conference on Machine Learning >Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
【24h】

Kernel-Based Reinforcement Learning in Robust Markov Decision Processes

机译:基于内核的强制性高潮策略决策过程

获取原文

摘要

The robust Markov Decision Process (MDP) framework aims to address the problem of parameter uncertainty due to model mismatch, approximation errors or even adversarial behaviors. It is especially relevant when deploying the learned policies in real-world applications. Scaling up the robust MDP framework to large or continuous state space remains a challenging problem. The use of function approximation in this case is usually inevitable and this can only amplify the problem of model mismatch and parameter uncertainties. It has been previously shown that, in the case of MDPs with state aggregation, the robust policies enjoy a tighter performance bound compared to standard solutions due to its reduced sensitivity to approximation errors. We extend these results to the much larger class of kernel-based approximators and show, both analytically and empirically that the robust policies can significantly outperform the non-robust counterpart.
机译:强大的马尔可夫决策过程(MDP)框架旨在解决由于模型不匹配,近似误差甚至逆势行为而导致参数不确定性的问题。在现实世界应用中部署学习策略时,尤其相关。将强大的MDP框架扩展到大或连续状态空间仍然是一个具有挑战性的问题。在这种情况下,使用功能近似通常是不可避免的,这只能放大模型不匹配和参数不确定性的问题。先前已经表明,在具有状态聚合的MDP的情况下,由于其对近似误差的灵敏度降低,强大的策略与标准解决方案相比享有更紧密的性能。我们将这些结果扩展到基于更大类的基于内核的近似器和显示,并经验地显示,强大的政策可以显着优于非强大的对应物。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号