Stochastic optimization of controlled partially observable Markov decision processes

机译：受控部分可观察到的马尔可夫决策过程的随机优化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We introduce an online algorithm for finding local maxima of the average reward in a partially observable Markov decision process (POMDP) controlled by a parameterized policy. Optimization is over the parameters of the policy. The algorithm's chief advantages are that it requires only a single sample path of the POMDP, it uses only one free parameter /spl beta//spl isin/(0, 1), which has a natural interpretation in terms of a bias-variance trade-off, and it requires no knowledge of the underlying state. In addition, the algorithm can be applied to infinite state, control and observation spaces. We prove almost-sure convergence of our algorithm, and show how the correct setting of /spl beta/ is related to the mixing time of the Markov chain induced by the POMDP.

机译：我们介绍一个在线算法，用于在由参数化策略控制的部分可观察的马尔可夫决策过程（POMDP）中查找平均奖励的本地最大值。优化是策略的参数。该算法的主要优点是它只需要POMDP的单个样本路径，它仅使用一个免费参数/SPLβ// SPL ISIN /（0,1），这在偏差贸易方面具有自然解释 -off，它不需要对潜在国家的了解。另外，该算法可以应用于无限状态，控制和观察空间。我们证明了算法的几乎肯定融合，并展示了如何正确设置/SPLβ/与POMDP引起的马尔可夫链的混合时间有关。

著录项

来源
《IEEE Conference on Decision and Control》|2000年||共6页
会议地点
作者
Bartlett P.L.; Baxter J.; Institute of Electric and Electronic Engineer;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Stochastic Predictive Control for Partially Observable Markov Decision Processes With Time-Joint Chance Constraints and Application to Autonomous Vehicle Control [J] . Li Nan, Girard Anouck, Kolmanovsky Ilya Journal of Dynamic Systems, Measurement, and Control . 2019,第7期

机译：随机预测控制对部分观察到的马尔可夫决策过程，时间关节机会限制和应用于自主车辆控制
2. Optimum inspection and maintenance policies for corroded structures using partially observable Markov decision processes and stochastic, physically based models [J] . K.G. Papakonstantinou, M. Shinozuka Probabilistic engineering mechanics . 2014,第jula期

机译：使用部分可观察的马尔可夫决策过程和基于物理的随机模型对腐蚀结构进行最佳检查和维护，
3. Optimizing Spatial and Temporal Reuse inWireless Networks by Decentralized Partially Observable Markov Decision Processes [J] . IEEE transactions on mobile computing . 2014,第4期

机译：通过分散的部分可观察的马尔可夫决策过程优化无线网络的时空复用
4. Stochastic optimization of controlled partially observable Markov decision processes [C] . Bartlett, P.L., Baxter, Decision and Control, 2000. Proceedings of the 39th IEEE Conference on . 2000

机译：受控局部可观马尔可夫决策过程的随机优化
5. Pond-hindsight: Applying hindsight optimization to partially-observable markov decision processes. [D] . Olsen, Alan. 2011

机译：Pond-hindsight：将事后观察优化应用于部分可观察到的马尔可夫决策过程。
6. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes [O] . Rajesh P. N. Rao 2010

机译：不确定性下的决策：基于部分可观察的马尔可夫决策过程的神经模型
7. Stochastic Optimization of Controlled Partially Observable Markov Decision Processes [O] . Peter L. Bartlett, Jonathan Baxter 100

机译：受控部分可观察的马尔可夫决策过程的随机优化

Stochastic optimization of controlled partially observable Markov decision processes

摘要

著录项

相似文献

相关主题

期刊订阅