首页> 外文会议>Pacific Rim international conference on artificial intelligence >POPVI: A Probability-Based Optimal Policy Value Iteration Algorithm
【24h】

POPVI: A Probability-Based Optimal Policy Value Iteration Algorithm

机译:POPVI:基于概率的最优策略值迭代算法

获取原文

摘要

Point-based value iteration methods are a family of effective algorithms for solving POMDP models and their performance mainly depends on the exploration of the search space. Although global optimization can be obtained by algorithms such as HSVI and GapMin, their exploration of the optimal action is overly optimistic which therefore slows down the efficiency. In this paper, we propose a novel heuristic search method POPVI (Probability-based Optimal Policy Value Iteration) which explores the optimal action based on probability. In depth-first heuristic exploration, this algorithm uses a Monte-Carlo method to estimate the probabilities that actions are optimal according to the distribution of actions' Q-value function, applies the action of the maximum probability and greedily explores subsequent belief point of the greatest uncertainty. Experimental results show that POPVI outperforms HSVI, and by a large margin when the scale of the POMDP increases.
机译:基于点的值迭代方法是解决POMDP模型的有效算法家族,其性能主要取决于对搜索空间的探索。尽管可以通过诸如HSVI和GapMin之类的算法来获得全局优化,但是他们对最佳动作的探索过于乐观,因此会降低效率。在本文中,我们提出了一种新颖的启发式搜索方法POPVI(基于概率的最优策略值迭代),该算法探索基于概率的最优动作。在深度优先启发式探索中,该算法使用蒙特卡洛方法根据动作的Q值函数的分布来估计动作最佳的概率,应用最大概率的动作,然后贪婪地探索该动作的后续置信点。最大的不确定性。实验结果表明,当POMDP规模增加时,POPVI优于HSVI。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号