首页> 外文会议>International Conference on Autonomous Agents and Multiagent Systems >Reinforcement Learning Algorithms for Regret Minimization in Structured Markov Decision Processes
【24h】

Reinforcement Learning Algorithms for Regret Minimization in Structured Markov Decision Processes

机译:强化学习算法在结构化的马尔可夫决策过程中遗憾最小化

获取原文

摘要

A recent goal in the Reinforcement Learning (RL) framework is to choose a sequence of policy to minimize the regret incurred in a finite time horizon. For several RL problems in operation research and optimal control, the optimal policy of the underlying Markov Decision Process (MDP) is characterized by a known structure. The state of the art algorithms do not utilize this known structure of the optimal policy while minimizing regret. In this work, we develop new RL algorithms that exploit the structure of the optimal policy to minimize regret. Numerical experiments on MDPs with structured optimal policies show that our algorithms have better performance and are easy to implement.
机译:钢筋学习(RL)框架中最近的目标是选择一系列政策,以最大限度地减少在有限时间范围内产生的遗憾。对于运营研究和最佳控制中的几个RL问题,底层马尔可夫决策过程(MDP)的最佳政策的特征在于已知的结构。最先进的算法不利用最佳政策的这种已知结构,同时最小化遗憾。在这项工作中,我们开发了新的RL算法,该算法利用最佳政策的结构来最小化遗憾。具有结构化最佳政策的MDP的数值实验表明,我们的算法具有更好的性能,并且易于实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号