首页> 外文会议>Annual American Control Conference >Online Learning for Markov Decision Processes in Nonstationary Environments: A Dynamic Regret Analysis
【24h】

Online Learning for Markov Decision Processes in Nonstationary Environments: A Dynamic Regret Analysis

机译:非平稳环境中马尔可夫决策过程的在线学习:动态后悔分析

获取原文

摘要

In an online Markov decision process (MDP) with time-varying reward functions, a decision maker has to take an action before knowing the current reward function at each time step. This problem has received many research interests because of its wide range of applications. The literature usually focuses on static regret analysis by comparing the total reward of the optimal offline stationary policy and that of the online policies. This paper studies a different measure, dynamic regret, which is the reward difference between the optimal offline (possibly nonstationary) policies and the online policies. The measure suits better the time-varying environment. To obtain a meaningful regret analysis, we introduce a notion of total variation for the time-varying reward functions and bound the dynamic regret using the total variation. We propose an online algorithm, Follow the Weighted Leader (FWL), and prove that its dynamic regret can be upper bounded by the total variation. We also prove a lower bound of dynamic regrets for any online algorithm. The lower bound matches the upper bound of FWL, demonstrating the optimality of the algorithm. Finally, we show via simulation that our algorithm FWL significantly outperforms the existing algorithms in literature.
机译:在具有时变奖励函数的在线马尔可夫决策过程(MDP)中,决策者必须在每次知道当前奖励功能之前采取行动。由于其广泛的应用程序,此问题已收到许多研究兴趣。文献通常专注于通过比较最佳离线固定政策的总奖励以及在线政策的总奖励来侧重于静态遗憾分析。本文研究了不同的措施,动态遗憾,这是最佳离线(可能是非营养)政策和在线政策之间的奖励差异。该措施适合时变环境更好。为了获得有意义的遗憾分析,我们对时变奖励功能的总变化概念引入了总变化,并使用总体变异绑定动态遗憾。我们提出了一种在线算法,遵循加权领导者(FWL),并证明其动态遗憾可以是总变化的上限。我们还证明了任何在线算法的动态遗憾的下限。下限匹配FWL的上限,展示了算法的最优性。最后,我们通过模拟显示我们的算法FWL显着优于文献中现有的现有算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号