【24h】

Fast MCVI Based on Improved NSGA2

机译:基于改进的NSGA2的快速MCVI

获取原文

摘要

Nowadays, the partially observable Markov decision processes (POMDPs) is widely used in many fields. The solutions to POMDP suffer from prohibitive computational complexity due to curse of dimensionality, but MCVI for POMDP is envisioned as a promising approach to break the curse. Although MCVI is a great breakthrough toward solving this problem, it still has some defects, such as the slow convergence rate and the continuous growth of nodes' number of policy graph. To this end, the purpose of this paper is to provide a fast MCVI based on improved NSGA2. Different from the general NSGA2, the improved NSGA2 initializes the population by experiential knowledge and uses a self-adjustable value as the probability of cross and mutation. Before executing the MCVI, the algorithm will set a series of thresholds. When the algorithm gets a temporary policy graph which reaches one of the thresholds, it will use a discount operator to update the threshold and use the improved NSGA2 to update policy graph. After that, the algorithm will execute the MCVI again and repeat this process until the end. Numerical experiments show that the fast MCVI achieves about 8% increase in convergence rate over original MCVI, and about 60% decrease in nodes' number of policy graph, for the classic problem of corridor.
机译:如今,部分可观察的马尔可夫决策过程(POMDP)广泛用于许多领域。由于维数的诅咒,POMDP的解决方案遭受了计算量过大的困扰,但是将POMDP的MCVI设想为打破该诅咒的一种有前途的方法。尽管MCVI在解决这个问题上是一个巨大的突破,但它仍然存在一些缺陷,例如收敛速度慢和节点图数量的持续增长。为此,本文的目的是提供一种基于改进的NSGA2的快速MCVI。与一般的NSGA2不同,改进的NSGA2通过经验知识来初始化种群,并使用可自我调整的值作为交叉和变异的概率。在执行MCVI之前,该算法将设置一系列阈值。当算法获得达到阈值之一的临时策略图时,它将使用折扣运算符更新阈值,并使用改进的NSGA2更新策略图。之后,该算法将再次执行MCVI,并重复此过程直至结束。数值实验表明,对于经典的走廊问题,快速的MCVI的收敛速度比原始MCVI大约提高了8%,而节点的策略图数量则减少了约60%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号