Online learning in Markov decision processes with arbitrarily changing rewards and transitions

机译：马尔可夫决策过程中的在线学习，可任意更改奖励和过渡

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We consider decision-making problems in Markov decision processes where both the rewards and the transition probabilities vary in an arbitrary (e.g., non-stationary) fashion. We present algorithms that combine online learning and robust control, and establish guarantees on their performance evaluated in retrospect against alternative policies-i.e., their regret. These guarantees depend critically on the range of uncertainty in the transition probabilities, but hold regardless of the changes in rewards and transition probabilities over time. We present a version of the main algorithm in the setting where the decision-maker's observations are limited to its trajectory, and another version that allows a trade-off between performance and computational complexity.

机译：我们在马尔可夫决策过程中考虑决策问题，其中奖励和转移概率都以任意（例如，非平稳）方式变化。我们提出了将在线学习和鲁棒控制相结合的算法，并在回顾替代策略（即其遗憾）的过程中建立了对性能评估的保证。这些保证主要取决于过渡概率的不确定性范围，但无论奖励和过渡概率随时间的变化如何，这些保证都成立。在决策者的观察仅限于其轨迹的情况下，我们提供了主要算法的一种版本，以及允许在性能和计算复杂性之间进行权衡的另一种版本。

著录项

来源
《Game Theory for Networks, 2009. GameNets '09》|2009年|314-322|共9页
会议地点 Istanbul(TR);Istanbul(TR)
作者
Jia Yuan Yu; Mannor S.;
展开▼
作者单位

Dept. of Electr. Comput. Eng., McGill Univ., Montreal, QC, Canada;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Markov processes; computer aided instruction; decision making; decision theory; probability; Markov decision process; arbitrarily changing reward; decision-making; online learning; transition probability;

机译：马尔可夫过程;计算机辅助教学;决策;决策理论;概率;马尔可夫决策过程;任意变化的报酬;决策;在线学习;转移概率;

相似文献

外文文献
中文文献
专利

1. Markov Decision Processes with Arbitrary Reward Processes [J] . Jia Yuan Yu, Shie Mannor, Nahum Shimkin Mathematics of Operations Research . 2009,第3期

机译：具有任意奖励过程的马尔可夫决策过程
2. Markov Decision Processes with Arbitrary Reward Processes [J] . Jia Yuan Yu, Shie Mannor, Nahum Shimkin Mathematics of operations research . 2009,第3期

机译：带有任意奖励过程的马尔可夫决策过程
3. An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward [J] . Arnaud Doucet, Jan Peters, Matthew Hoffman, JMLR: Workshop and Conference Proceedings . 2009,第2009期

机译：具有任意奖励的连续马尔可夫决策过程的期望最大化算法
4. Online Learning in Markov Decision Processes with Arbitrarily Changing Rewards and Transitions [C] . Jia Yuan Yu, Shie Mannor International Conference on Game Theory for Networks . 2009

机译：在马尔可夫决策过程中在线学习，任意改变奖励和转换
5. Adaptive online optimization of Markov reward processes with application to pricing of multiclass loss network services. [D] . Campos-Nanez, Enrique. 2003

机译：马尔可夫奖励过程的自适应在线优化及其在多类亏损网络服务定价中的应用。
6. Learning to maximize reward rate: a model based on semi-Markov decision processes [O] . Arash Khodadadi, Pegah Fakhari, Jerome R. Busemeyer 2014

机译：学习最大化奖励率：基于半马尔可夫决策过程的模型
7. Online learning in Markov decision processes with arbitrarily changing rewards and transitions [O] . Jia Yuan Yu, Shie Mannor 2013

机译：马尔科夫决策过程中的在线学习，任意改变奖励和转换
8. MARKOVIAN DECISION PROCESSES WITH UNCERTAIN TRANSITION PROBABILITIES OR REWARDS [R] . Edward Allan Silver 1963

机译：马克思主义决策过程具有不确定的过渡概率或奖励

Online learning in Markov decision processes with arbitrarily changing rewards and transitions

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅