A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

Ross St??phane; Pineau Joelle; Chaib-draa Brahim; Kreitmann Pierre

首页> 外文期刊>Journal of machine learning research >A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

【24h】

A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

机译：在局部可观的马尔可夫决策过程中进行学习和计划的贝叶斯方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Bayesian learning methods have recently been shown to provide an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). The primary focus of this paper is to extend these ideas to the case of partially observable domains, by introducing the Bayes-Adaptive Partially Observable Markov Decision Processes. This new framework can be used to simultaneously (1) learn a model of the POMDP domain through interaction with the environment, (2) track the state of the system under partial observability, and (3) plan (near-)optimal sequences of actions. An important contribution of this paper is to provide theoretical results showing how the model can be finitely approximated while preserving good learning performance. We present approximate algorithms for belief tracking and planning in this model, as well as empirical results that illustrate how the model estimate and agent's return improve as a function of experience. color="gray">

机译：最近显示出贝叶斯学习方法为强化学习中的探索与开发权衡提供了一种优雅的解决方案。但是，迄今为止，对贝叶斯强化学习的大多数研究都集中在标准的马尔可夫决策过程（MDP）上。本文的主要重点是通过介绍贝叶斯自适应部分可观察的马尔可夫决策过程，将这些思想扩展到部分可观察的域。此新框架可用于同时（1）通过与环境交互来学习POMDP域的模型;（2）在部分可观察性的情况下跟踪系统状态;以及（3）计划（近乎）最佳操作序列。本文的重要贡献是提供理论结果，说明如何在保持良好学习性能的同时对模型进行有限近似。我们提供此模型中用于信念跟踪和计划的近似算法，以及经验结果，这些结果说明了模型估计和代理回报如何根据经验而提高。 color =“ gray”>

著录项

来源
《Journal of machine learning research》 |2011年第5期|共42页
作者
Ross St??phane; Pineau Joelle; Chaib-draa Brahim; Kreitmann Pierre;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. An Argument for the Bayesian Control of Partially Observable Markov Decision Processes [J] . Vargo E., Cogill R. Automatic Control, IEEE Transactions on . 2014,第10期

机译：部分可观察的马尔可夫决策过程的贝叶斯控制的一个争论
2. A Pulse Neural Network Reinforcement Learning Algorithm for Partially Observable Markov Decision Processes [J] . Koichiro Takita, Masafumi Hagiwara Systems and Computers in Japan . 2005,第3期

机译：部分可观察的马尔可夫决策过程的脉冲神经网络强化学习算法
3. CHQ: A Multi-Agent Reinforcement Learning Scheme for Partially Observable Markov Decision Processes [J] . Hiroshi OSADA, Satoshi FUJITA IEICE Transactions on Information and Systems . 2005,第5期

机译：CHQ：用于部分可观察的马尔可夫决策过程的多智能体强化学习方案
4. Policy Reuse for Learning and Planning in Partially Observable Markov Decision Processes [C] . Bo Wu, Yanpeng Feng International Conference on Information Science and Control Engineering . 2017

机译：在部分可观察的马尔可夫决策过程中用于学习和计划的策略重用
5. Hierarchical learning and planning in partially observable Markov decision processes. [D] . Theocharous, Georgios N. 2002

机译：可部分观察的马尔可夫决策过程中的分层学习和计划。
6. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes [O] . Rajesh P. N. Rao 2010

机译：不确定性下的决策：基于部分可观察的马尔可夫决策过程的神经模型
7. Learning in Partially Observable Markov Decision Processes [O] . Sachan Mohit 2013

机译：在部分可观察的马尔可夫决策过程中学习
8. Cooperation and Coordination Between Fuzzy Reinforcement Learning Agents in Continuous State Partially Observable Markov Decision Processes [R] . Berenji, Hamid R., Vengerov, David 1999

机译：连续状态部分可观测马尔可夫决策过程中模糊强化学习agent的协作与协调

A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

摘要

著录项

相似文献

相关主题

期刊订阅