首页> 外文会议>Decision and Control, 2000. Proceedings of the 39th IEEE Conference on >Stochastic optimization of controlled partially observable Markov decision processes

【24h】

Stochastic optimization of controlled partially observable Markov decision processes

机译：受控局部可观马尔可夫决策过程的随机优化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We introduce an online algorithm for finding local maxima of the average reward in a partially observable Markov decision process (POMDP) controlled by a parameterized policy. Optimization is over the parameters of the policy. The algorithm's chief advantages are that it requires only a single sample path of the POMDP, it uses only one free parameter /spl beta//spl isin/(0, 1), which has a natural interpretation in terms of a bias-variance trade-off, and it requires no knowledge of the underlying state. In addition, the algorithm can be applied to infinite state, control and observation spaces. We prove almost-sure convergence of our algorithm, and show how the correct setting of /spl beta/ is related to the mixing time of the Markov chain induced by the POMDP.

机译：我们介绍了一种在线算法，该算法可在通过参数化策略控制的部分可观察到的马尔可夫决策过程（POMDP）中找到平均奖励的局部最大值。优化超出了策略的参数。该算法的主要优势在于，它仅需要POMDP的单个样本路径，仅使用一个自由参数/ spl beta // spl isin /（0，1），这在偏差-方差交易方面具有很自然的解释。 -off，它不需要任何底层状态的知识。另外，该算法可以应用于无限状态，控制和观察空间。我们证明了我们算法的几乎确定的收敛性，并说明了/ spl beta /的正确设置与POMDP诱导的马尔可夫链的混合时间如何相关。

著录项

来源
《Decision and Control, 2000. Proceedings of the 39th IEEE Conference on 》|2000年|P.124-129|共6页
会议地点
作者
Bartlett; P.L.; Baxter; J.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术 ;
关键词

相似文献

外文文献
中文文献
专利

1. Stochastic Predictive Control for Partially Observable Markov Decision Processes With Time-Joint Chance Constraints and Application to Autonomous Vehicle Control [J] . Li Nan, Girard Anouck, Kolmanovsky Ilya Journal of Dynamic Systems, Measurement, and Control . 2019 ,第7期

机译：随机预测控制对部分观察到的马尔可夫决策过程，时间关节机会限制和应用于自主车辆控制
2. Optimum inspection and maintenance policies for corroded structures using partially observable Markov decision processes and stochastic, physically based models [J] . K.G. Papakonstantinou, M. Shinozuka Probabilistic engineering mechanics . 2014 ,第jula期

机译：使用部分可观察的马尔可夫决策过程和基于物理的随机模型对腐蚀结构进行最佳检查和维护，
3. Optimizing Spatial and Temporal Reuse inWireless Networks by Decentralized Partially Observable Markov Decision Processes [J] . IEEE transactions on mobile computing . 2014 ,第4期

机译：通过分散的部分可观察的马尔可夫决策过程优化无线网络的时空复用
4. Stochastic optimization of controlled partially observable Markov decision processes [C] . Bartlett P.L., Baxter J., Institute of Electric and Electronic Engineer IEEE Conference on Decision and Control . 2000

机译：受控部分可观察到的马尔可夫决策过程的随机优化
5. Pond-hindsight: Applying hindsight optimization to partially-observable markov decision processes. [D] . Olsen, Alan. 2011

机译：Pond-hindsight：将事后观察优化应用于部分可观察到的马尔可夫决策过程。
6. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes [O] . Rajesh P. N. Rao 2010

机译：不确定性下的决策：基于部分可观察的马尔可夫决策过程的神经模型
7. Stochastic Optimization of Controlled Partially Observable Markov Decision Processes [O] . Peter L. Bartlett, Jonathan Baxter 100

机译：受控部分可观察的马尔可夫决策过程的随机优化

Stochastic optimization of controlled partially observable Markov decision processes

摘要

著录项

相似文献

相关主题

期刊订阅