BOUNDED-PARAMETER PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES: FRAMEWORK AND ALGORITHM

YAODONG NI; ZHI-QIANG LIU

首页> 外文期刊>International Journal of Uncertainty, Fuzziness, and Knowledge-based Systems >BOUNDED-PARAMETER PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES: FRAMEWORK AND ALGORITHM

【24h】

BOUNDED-PARAMETER PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES: FRAMEWORK AND ALGORITHM

机译：有界参数部分可观察的马尔可夫决策过程：框架和算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Partially observable Markov decision processes (POMDPs) are powerful for planning under uncertainty. However, it is usually impractical to employ a POMDP with exact parameters to model the real-life situation precisely, due to various reasons such as limited data for learning the model, inability of exact POMDPs to model dynamic situations, etc. In this paper, assuming that the parameters of POMDPs are imprecise but bounded, we formulate the framework of bounded-parameter partially observable Markov decision processes (BPOMDPs). A modified value iteration is proposed as a basic strategy for tackling parameter imprecision in BPOMDPs. In addition, we design the UL-based value iteration algorithm, in which each value backup is based on two sets of vectors called U-set and L-set. We propose four strategies for computing U-set and L-set. We analyze theoretically the computational complexity and the reward loss of the algorithm. The effectiveness and robustness of the algorithm are shown empirically.

机译：部分可观察的马尔可夫决策过程（POMDP）对于不确定性下的计划非常有力。但是，由于各种原因（例如，学习模型的数据有限，无法精确地对动态情况进行建模等），采用带有精确参数的POMDP来精确地模拟现实情况通常是不切实际的。假设POMDP的参数不精确但有界，我们制定了有界参数部分可观察的马尔可夫决策过程（BPOMDP）的框架。提出了修改后的值迭代作为解决BPOMDP中参数不精确性的基本策略。此外，我们设计了基于UL的值迭代算法，其中每个值备份都基于两组向量（分别称为U集和L集）。我们提出了四种计算U集和L集的策略。我们从理论上分析了算法的计算复杂度和报酬损失。实验证明了该算法的有效性和鲁棒性。

著录项

来源
《International Journal of Uncertainty, Fuzziness, and Knowledge-based Systems》 |2013年第6期|821-863|共43页
作者
YAODONG NI; ZHI-QIANG LIU;
展开▼
作者单位

School of Information Technology and Management,University of International Business and Economics,Beijing, 100029, China;

School of Creative Media, City University of Hong Kong,Hong Kong, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Decision making under uncertainty; planning under uncertainty; bounded-parameter POMDP; modified value iteration; ULVI algorithm;

机译：不确定情况下的决策;在不确定的情况下进行计划;有界参数POMDP;修改值迭代;ULVI算法;

相似文献

外文文献
中文文献
专利

1. A Pulse Neural Network Reinforcement Learning Algorithm for Partially Observable Markov Decision Processes [J] . Koichiro Takita, Masafumi Hagiwara Systems and Computers in Japan . 2005,第3期

机译：部分可观察的马尔可夫决策过程的脉冲神经网络强化学习算法
2. The Optimal Observability of Partially Observable Markov Decision Processes: Discrete State Space [J] . Rezaeian M.Vo B.-N.Evans J. S. Automatic Control, IEEE Transactions on . 2010,第12期

机译：部分可观马尔可夫决策过程的最优可观性：离散状态空间
3. Monotonicity properties for two-action partially observable Markov decision processes on partially ordered spaces [J] . European Journal of Operational Research . 2020,第3期

机译：两个动作部分可观察到的Markov决策过程的单调性属性在部分有序空间上
4. Bounded-Parameter Partially Observable Markov Decision Processes [C] . Yaodong Ni, Zhi-Qiang Liu Proceedings of the Eighteenth international conference on automated planning and scheduling . 2008

机译：有界参数部分可观察的马尔可夫决策过程
5. Increasing scalability in algorithms for centralized and decentralized partially observable Markov decision processes: Efficient decision-making and coordination in uncertain environments. [D] . Amato, Christopher. 2010

机译：用于集中式和分散式部分可观察的马尔可夫决策过程的算法中的可伸缩性不断增强：在不确定的环境中进行有效的决策和协调。
6. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes [O] . Rajesh P. N. Rao 2010

机译：不确定性下的决策：基于部分可观察的马尔可夫决策过程的神经模型
7. Algorithms for Partially Observable Markov Decision Processes [O] . Weihong Zhang 2001

机译：部分可观测马尔可夫决策过程的算法

BOUNDED-PARAMETER PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES: FRAMEWORK AND ALGORITHM

摘要

著录项

相似文献

相关主题

期刊订阅