首页> 外文会议>International Conference on Industrial Technology >A Special Case of Partially Observable Markov Decision Processes Problem by Event-Based Optimization
【24h】

A Special Case of Partially Observable Markov Decision Processes Problem by Event-Based Optimization

机译:基于事件的优化的部分可观察到的马尔可夫决策过程的特殊情况

获取原文

摘要

In this paper, we discuss a kind of partially observable Markov decision process (POMDP) problem by the event-based optimization which is proposed in [4]. A POMDP ([7] and [8]) is a generalization of a standard completely observable Markov decision process that allows imperfect information about states of the system. Policy iteration algorithms for POMDPs have proved to be impractical as it is very difficult to implement. Thus, most work with POMDPs has used value iteration. But for a special case of POMDP, we can formulate it to an MDP problem. Then we can use our sensitivity view to derive the corresponding average reward difference formula. Based on that and the idea of event-based optimization, we use a single sample path to estimate aggregated potentials. Then we develop policy iteration (PI) algorithms.
机译:在本文中,我们通过[4]中提出的基于事件的优化讨论了一种部分观察到的马尔可夫决策过程(POMDP)问题。 POMDP([7]和[8])是标准完全可观察的马尔可夫决策过程的概括,其允许不完美的关于系统状态的信息。 POMDP的政策迭代算法已经证明是不切实际的,因为它很难实施。因此,大多数与POMDP一起工作已经使用了价值迭代。但对于POMDP的特殊情况,我们可以将其制定给MDP问题。然后我们可以使用我们的敏感性视图来派生相应的平均奖励差异公式。基于此基于事件的优化的思想,我们使用单个样本路径来估计聚合电位。然后我们开发政策迭代(PI)算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号