POPVI: A Probability-Based Optimal Policy Value Iteration Algorithm

机译：POPVI：基于概率的最优策略值迭代算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Point-based value iteration methods are a family of effective algorithms for solving POMDP models and their performance mainly depends on the exploration of the search space. Although global optimization can be obtained by algorithms such as HSVI and GapMin, their exploration of the optimal action is overly optimistic which therefore slows down the efficiency. In this paper, we propose a novel heuristic search method POPVI (Probability-based Optimal Policy Value Iteration) which explores the optimal action based on probability. In depth-first heuristic exploration, this algorithm uses a Monte-Carlo method to estimate the probabilities that actions are optimal according to the distribution of actions' Q-value function, applies the action of the maximum probability and greedily explores subsequent belief point of the greatest uncertainty. Experimental results show that POPVI outperforms HSVI, and by a large margin when the scale of the POMDP increases.

机译：基于点的值迭代方法是解决POMDP模型的有效算法家族，其性能主要取决于对搜索空间的探索。尽管可以通过诸如HSVI和GapMin之类的算法来获得全局优化，但是他们对最佳动作的探索过于乐观，因此会降低效率。在本文中，我们提出了一种新颖的启发式搜索方法POPVI（基于概率的最优策略值迭代），该算法探索基于概率的最优动作。在深度优先启发式探索中，该算法使用蒙特卡洛方法根据动作的Q值函数的分布来估计动作最佳的概率，应用最大概率的动作，然后贪婪地探索该动作的后续置信点。最大的不确定性。实验结果表明，当POMDP规模增加时，POPVI优于HSVI。

著录项

来源
《Pacific Rim international conference on artificial intelligence》|2014年|627-639|共13页
会议地点
作者
Feng Liu; Bin Luo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Intelligent agent; POPVI; Monte-Carlo method;

机译：智能代理; POPVI;蒙特卡洛法;

相似文献

外文文献
中文文献
专利

1. A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward [J] . Liang Mingming, Wei Qinglai Neurocomputing . 2021,第Feba1期

机译：非线性神经最优控制的部分政策迭代ADP算法，折扣总奖励
2. Neuro-Optimal Control for Discrete Stochastic Processes via a Novel Policy Iteration Algorithm [J] . Mingming Liang, Ding Wang, Derong Liu IEEE Transactions on Systems, Man, and Cybernetics . 2020,第11期

机译：通过新型政策迭代算法的离散随机过程的神经最优控制
3. Data-driven optimal tracking control of discrete-time multi-agent systems with two-stage policy iteration algorithm [J] . Peng Zhinan, Zhao Yiyi, Hu Jiangping, Information Sciences: An International Journal . 2019,第期

机译：具有两级策略迭代算法的离散时间多代理系统的数据驱动最优跟踪控制
4. POPVI: A Probability-Based Optimal Policy Value Iteration Algorithm [C] . Feng Liu, Bin Luo Pacific Rim International Conference on Artificial Intelligence . 2014

机译：popvi：基于概率的最优策略值迭代算法
5. Low-complexity iterative algorithms for near-optimal detection in like-signal interference. [D] . Golshan, Ali Robert. 2003

机译：低复杂度的迭代算法，用于类似信号干扰中的最佳检测。
6. Optimality condition and iterative thresholding algorithm for ... formula ...-regularization problems [O] . Hongwei Jiao, Yongqiang Chen, Jingben Yin -1

机译：公式正则化问题的最优条件和迭代阈值算法
7. On policy iteration as a Newton’s method and polynomial policy iteration algorithms [O] . Omid Madani 2002

机译：关于作为牛顿方法的策略迭代和多项式策略迭代算法

POPVI: A Probability-Based Optimal Policy Value Iteration Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅