A Special Case of Partially Observable Markov Decision Processes Problem by Event-Based Optimization

机译：基于事件的优化的部分可观察到的马尔可夫决策过程的特殊情况

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we discuss a kind of partially observable Markov decision process (POMDP) problem by the event-based optimization which is proposed in [4]. A POMDP ([7] and [8]) is a generalization of a standard completely observable Markov decision process that allows imperfect information about states of the system. Policy iteration algorithms for POMDPs have proved to be impractical as it is very difficult to implement. Thus, most work with POMDPs has used value iteration. But for a special case of POMDP, we can formulate it to an MDP problem. Then we can use our sensitivity view to derive the corresponding average reward difference formula. Based on that and the idea of event-based optimization, we use a single sample path to estimate aggregated potentials. Then we develop policy iteration (PI) algorithms.

机译：在本文中，我们通过[4]中提出的基于事件的优化讨论了一种部分观察到的马尔可夫决策过程（POMDP）问题。 POMDP（[7]和[8]）是标准完全可观察的马尔可夫决策过程的概括，其允许不完美的关于系统状态的信息。 POMDP的政策迭代算法已经证明是不切实际的，因为它很难实施。因此，大多数与POMDP一起工作已经使用了价值迭代。但对于POMDP的特殊情况，我们可以将其制定给MDP问题。然后我们可以使用我们的敏感性视图来派生相应的平均奖励差异公式。基于此基于事件的优化的思想，我们使用单个样本路径来估计聚合电位。然后我们开发政策迭代（PI）算法。

著录项

来源
《International Conference on Industrial Technology》|2016年|1452-2168 p. :|共5页
会议地点
作者
Junyu Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 T-532;
关键词
POMDP; PI; estimate aggregated potentials;

机译：POMDP;PI;估计汇总潜力;

相似文献

外文文献
中文文献
专利

1. Optimizing Spatial and Temporal Reuse inWireless Networks by Decentralized Partially Observable Markov Decision Processes [J] . IEEE transactions on mobile computing . 2014,第4期

机译：通过分散的部分可观察的马尔可夫决策过程优化无线网络的时空复用
2. Partial-Information State-Based Optimization of Partially Observable Markov Decision Processes and the Separation Principle [J] . IEEE Transactions on Automatic Control . 2014,第4期

机译：基于局部信息状态的局部可观马尔可夫决策过程优化与分离原理
3. The Optimal Observability of Partially Observable Markov Decision Processes: Discrete State Space [J] . Rezaeian M.Vo B.-N.Evans J. S. Automatic Control, IEEE Transactions on . 2010,第12期

机译：部分可观马尔可夫决策过程的最优可观性：离散状态空间
4. A Special Case of Partially Observable Markov Decision Processes Problem by Event-Based Optimization [C] . Junyu Zhang International Conference on Industrial Technology . 2016

机译：基于事件的优化的部分可观察到的马尔可夫决策过程的特殊情况
5. Pond-hindsight: Applying hindsight optimization to partially-observable markov decision processes. [D] . Olsen, Alan. 2011

机译：Pond-hindsight：将事后观察优化应用于部分可观察到的马尔可夫决策过程。
6. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes [O] . Rajesh P. N. Rao 2010

机译：不确定性下的决策：基于部分可观察的马尔可夫决策过程的神经模型
7. Stochastic Optimization of Controlled Partially Observable Markov Decision Processes [O] . Peter L. Bartlett, Jonathan Baxter 100

机译：受控部分可观察的马尔可夫决策过程的随机优化

A Special Case of Partially Observable Markov Decision Processes Problem by Event-Based Optimization

摘要

著录项

相似文献

相关主题

期刊订阅