Markov Decision Processes with Arbitrary Reward Processes

机译：马尔可夫决策过程，具有任意奖励过程

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider a control problem where the decision maker interacts with a standard Markov decision process with the exception that the reward functions vary arbitrarily over time. We extend the notion of Hannan consistency to this setting, showing that, in hindsight, the agent can perform almost as well as every deterministic policy. We present efficient online algorithms in the spirit of reinforcement learning that ensure that the agent's performance loss, or regret, vanishes over time, provided that the environment is oblivious to the agent's actions. However, counterexamples indicate that the regret does not vanish if the environment is not oblivious.

机译：我们考虑一个控制问题，决策者与标准马尔可夫决策过程互动，但奖励函数随着时间的推移是任意的。我们将汉南的概念扩展到此设置，显示，在后视，代理人可以执行几乎和每个确定性政策。我们以强化学习的精神展示高效的在线算法，以确保代理人的性能损失或后悔随着时间的推移消失，条件是环境对代理商的行为感到沮丧。然而，如果环境没有忘记，指控率表明遗憾不会消失。

著录项

来源
《European Workshop on Reinforcement Learning》|2008年||共14页
会议地点
作者
Jia Yuan Yu; Shie Mannor; Nahum Shimkin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
入库时间 2022-08-20 21:22:01

相似文献

外文文献
中文文献
专利

1. Markov Decision Processes with Arbitrary Reward Processes [J] . Jia Yuan Yu, Shie Mannor, Nahum Shimkin Mathematics of Operations Research . 2009,第3期

机译：具有任意奖励过程的马尔可夫决策过程
2. Markov Decision Processes with Arbitrary Reward Processes [J] . Jia Yuan Yu, Shie Mannor, Nahum Shimkin Mathematics of operations research . 2009,第3期

机译：带有任意奖励过程的马尔可夫决策过程
3. An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward [J] . Arnaud Doucet, Jan Peters, Matthew Hoffman, JMLR: Workshop and Conference Proceedings . 2009,第2009期

机译：具有任意奖励的连续马尔可夫决策过程的期望最大化算法
4. Markov Decision Processes with Arbitrary Reward Processes [C] . Jia Yuan Yu, Shie Mannor, Nahum Shimkin Recent advances in reinforcement learning . 2008

机译：具有任意奖励过程的马尔可夫决策过程
5. Regret-based reward elicitation for Markov decision processes. [D] . Kevin, Regan. 2014

机译：基于后悔的马尔可夫决策过程的奖励启发。
6. Learning to maximize reward rate: a model based on semi-Markov decision processes [O] . Arash Khodadadi, Pegah Fakhari, Jerome R. Busemeyer 2014

机译：学习最大化奖励率：基于半马尔可夫决策过程的模型
7. Markov decision processes with arbitrary reward processes [O] . Jia Yuan Yu, Shie Mannor, Nahum Shimkin 2013

机译：具有任意奖励过程的马尔可夫决策过程

Markov Decision Processes with Arbitrary Reward Processes

摘要

著录项

相似文献

相关主题

期刊订阅