首页> 外文会议>Game Theory for Networks, 2009. GameNets '09 >Online learning in Markov decision processes with arbitrarily changing rewards and transitions
【24h】

Online learning in Markov decision processes with arbitrarily changing rewards and transitions

机译:马尔可夫决策过程中的在线学习,可任意更改奖励和过渡

获取原文

摘要

We consider decision-making problems in Markov decision processes where both the rewards and the transition probabilities vary in an arbitrary (e.g., non-stationary) fashion. We present algorithms that combine online learning and robust control, and establish guarantees on their performance evaluated in retrospect against alternative policies-i.e., their regret. These guarantees depend critically on the range of uncertainty in the transition probabilities, but hold regardless of the changes in rewards and transition probabilities over time. We present a version of the main algorithm in the setting where the decision-maker's observations are limited to its trajectory, and another version that allows a trade-off between performance and computational complexity.
机译:我们在马尔可夫决策过程中考虑决策问题,其中奖励和转移概率都以任意(例如,非平稳)方式变化。我们提出了将在线学习和鲁棒控制相结合的算法,并在回顾替代策略(即其遗憾)的过程中建立了对性能评估的保证。这些保证主要取决于过渡概率的不确定性范围,但无论奖励和过渡概率随时间的变化如何,这些保证都成立。在决策者的观察仅限于其轨迹的情况下,我们提供了主要算法的一种版本,以及允许在性能和计算复杂性之间进行权衡的另一种版本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号