基于点的POMDPs在线值迭代算法

仵博; 吴敏; 佘锦华

首页> 中文期刊> 《软件学报》 >基于点的POMDPs在线值迭代算法

基于点的POMDPs在线值迭代算法

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

部分可观察马尔可夫决策过程(partially observable Markov decision processes,简称POMDPs)是动态不确定环境下序贯决策的理想模型,但是现有离线算法陷入信念状态“维数灾”和“历史灾”问题,而现有在线算法无法同时满足低误差与高实时性的要求,造成理想的POMDPs模型无法在实际工程中得到应用.对此,提出一种基于点的POMDPs在线值迭代算法(point-based online value iteration,简称PBOVI).该算法在给定的可达信念状态点上进行更新操作,避免对整个信念状态空间单纯体进行求解,加速问题求解；采用分支界限裁剪方法对信念状态与或树进行在线裁剪;提出信念状态结点重用思想,重用上一时刻已求解出的信念状态点,避免重复计算.实验结果表明,该算法具有较低误差率、较快收敛性,满足系统实时性的要求.%Partially observable Markov decision processes (POMDPs) provide a rich framework for sequential decision-making in stochastic domains of uncertainty.However,solving POMDPs is typically computationally intractable because the belief states of POMDPs have two curses:Dimensionality and history,and online algorithms that can not simultaneously satisfy the requirement of low errors and high timeliness.In order to address these problems,this paper proposes a point-based online value iteration (PBOVI) algorithm for POMDPs.This algorithm for speeding up POMDPs solving involves performing value backup at specific reachable belief points,rather than over the entire a belief simplex.The paper exploits branch-and-bound pruning approach to prune the AND/OR tree of belief states online and proposes a novel idea to reuse the belief states that have been computed last time to avoid repeated computation.The experiment and simulation results show that the proposed algorithm has its effectiveness in reducing the cost of computing policies and retaining the quality of the policies,so it can meet the requirement of a real-time system.

著录项

来源
《软件学报》 |2013年第1期|25-36|共12页
作者
仵博; 吴敏; 佘锦华;
展开▼
作者单位

中南大学信息科学与工程学院;

湖南长沙410083;

先进控制与智能自动化湖南省工程实验室;

湖南长沙410083;

深圳职业技术学院教育技术与信息中心;

广东深圳51S055;

中南大学信息科学与工程学院;

湖南长沙410083;

先进控制与智能自动化湖南省工程实验室;

湖南长沙410083;

School of Computer Science;

Tokyo University of Technology;

Toky0 192-0982;

Japan;

展开▼
原文格式 PDF
正文语种 chi
中图分类理论、方法;
关键词
部分可观察马尔可夫决策过程; 信念状态; 基于点的算法; 在线算法; 与或树;

相似文献

中文文献
外文文献
专利

1. POMDP基于点的值迭代算法中一种信念选择方法 [J] . 冯奇 ,周雪忠 ,黄厚宽 . 北京交通大学学报 . 2009,第005期
2. 基于策略迭代和值迭代的POMDP算法 [J] . 孙湧 ,仵博 ,冯延蓬 . 计算机研究与发展 . 2008,第010期
3. 基于点的FO-POMDP值迭代方法研究 [J] . 陈丽娜 ,黄宏斌 ,邓苏 . 计算机工程 . 2013,第010期
4. 基于环境状态分布优化的POMDP值迭代求解算法 [J] . 朱荣鑫 ,王譞 ,刘峰 . 计算机应用研究 . 2022,第2期
5. 基于循环卷积神经网络的POMDP值迭代算法 [J] . 于丹宁 ,倪坤 ,刘云龙 . 计算机工程 . 2021,第002期
6. 基于k中心点的迭代局部搜索聚类算法 [C] . 吴景岚 ,朱文兴 . 第二十一届中国数据库学术会议 . 2004
7. 基于点的值迭代算法在POMDP问题中的研究 [A] . 房俊恒 . 2015

基于点的POMDPs在线值迭代算法

摘要

著录项

相似文献

相关主题

期刊订阅