...
首页> 外文期刊>Journal of Quality in Maintenance Engineering >An optimal policy for partially observable Markov decision processes with non-independent monitors
【24h】

An optimal policy for partially observable Markov decision processes with non-independent monitors

机译:具有非独立监控器的可部分观测马尔可夫决策过程的最优策略

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Purpose - This research investigated the optimal structure of a discrete-time Markov deterioration system monitored by multiple non-independent monitors. The purpose is to obtain a sufficient condition with which the optimal policy is given by a control limit policy. Design/methodology/approach - The model of this research is formulated as a partially observable Markov decision process. The problem is to obtain an optimal policy which can minimize the expected total discounted cost over an infinite horizon. Findings - The research found that the expected optimal cost function over an infinite horizon has a property of control limit policy given the conditions that a transition probability having a property of totally positive of order 2 and a conditional probability of the monitors having a property of weak multivariate monotone likelihood ratio. Furthermore, we showed that the optimal policy has only four action regions at most. Practical implications - If the optimum policy can be limited to a control limit policy, the tremendous amount of calculation time required to find the optimum procedure can be reduced. This enables the best decision to be identified in a much shorter period of time. Originality/value - A deterioration system monitored incompletely by one monitor has been studied in the previous research. This research considered the case of a multiple number monitors whose observations were not independent.
机译:目的-本研究调查了由多个非独立监测器监测的离散时间马尔可夫恶化系统的最佳结构。目的是获得通过控制限制策略给出最佳策略的充分条件。设计/方法/方法-本研究的模型被表述为部分可观察到的马尔可夫决策过程。问题是要获得一种最佳策略,该策略可以在无限远的时间内使预期的总折现成本最小化。发现-研究发现,在以下条件下,在无限范围内的预期最优成本函数具有控制限制策略的属性:给定条件,即转移概率具有完全为2阶的正数,而监视器的条件概率为弱多元单调似然比。此外,我们表明最优政策最多只有四个行动区域。实际意义-如果可以将最佳策略限制为控制限制策略,则可以减少找到最佳过程所需的大量计算时间。这样可以在更短的时间内确定最佳决策。原创性/价值-在先前的研究中已经研究了由一个监视器不完全监视的恶化系统。这项研究考虑了多个监视器的情况,这些监视器的观察结果不是独立的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号