GPU-Accelerated Value Iteration for the Computation of Reachability Probabilities in MDPs

机译：GPU加速的MDP中可达性概率的价值迭代

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Computation of reachability probabilities is an important subroutine for determining approximately optimal policies of MDPs [3]. Value iteration (VI) [2] is a well-known method to compute these values. However, sequential implementation of VI is computationally expensive both in terms of time and memory. Hence, we propose a highly parallel version of VI to solve general MDPs utilizing the GPU, which is widely used in the recent years to accelerate execution performance of various computational methods in many areas [1, 7, 6]. Our approach explores algebraic features (e.g., matrix structure) of MDPs, and uses action-based matrices to achieve massive parallelism for efficiency. We empirically evaluate our approach on several case studies. Our results show that we can achieve up to 10X～ speedup compared to sequential VI, and outperform topological value iteration (TVI) [4] in most of the cases. Particularly, for MDPs which do not contain strongly connected components (SCCs) with more than one state, or which contain a small number of large SCCs, our approach achieves up to 17X speedup compared to TVI.

机译：可达性概率的计算是确定MDPS近似最佳策略的重要子程序[3]。值迭代（VI）[2]是一个众所周知的方法来计算这些值。然而，在时间和内存方面，VI的连续实现是计算昂贵的。因此，我们提出了一种高度平行的VI版本，以利用GPU解决一般MDP，该GPU在近年来广泛使用，以加速各种计算方法在许多领域的执行性能[1,7,6]。我们的方法探讨了MDP的代数特征（例如，矩阵结构），并使用基于动作的矩阵来实现效率的大规模行度。我们在若干案例研究中凭经验评估了我们的方法。我们的结果表明，与顺序VI相比，我们可以实现高达10x〜加速，并且在大多数情况下优于拓扑价值迭代（TVI）[4]。特别是，对于不包含多于一个状态的强大连接的组件（SCC）的MDP，或者包含少量大型SCC，我们的方法与TVI相比，我们的方法可以实现高达17倍的加速。

著录项

来源
《European Conference on Artificial Intelligence;Conference on Prestigious Applications of Intelligent Systems》|2016年|913-1833p|共2页
会议地点
作者
Zhimin Wu; Ernst Moritz Hahn; Akin Gunay; Lijun Zhang; Yang Liu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Reduction of total-cost and average-cost MDPs with weakly continuous transition probabilities to discounted MDPs [J] . Feinberg Eugene A., Huang Jefferson Operations Research Letters: A Journal of the Operations Research Society of America . 2018,第2期

机译：减少总成本和平均成本MDP，具有弱连续的过渡概率来折扣MDP
2. Interval iteration algorithm for MDPs and IMDPs [J] . Haddad Serge, Monmege Benjamin Theoretical computer science . 2018,第期

机译：用于MDP和IMDP的间隔迭代算法
3. A GPU-Accelerated Multivoxel Update Scheme for Iterative Coordinate Descent (ICD) Optimization in Statistical Iterative CT Reconstruction (SIR) [J] . Sungsoo Ha, Klaus Mueller Computational Imaging, IEEE Transactions on . 2018,第3期

机译：统计迭代CT重建（SIR）中用于迭代坐标下降（ICD）优化的GPU加速Multivoxel更新方案
4. GPU-Accelerated Value Iteration for the Computation of Reachability Probabilities in MDPs [C] . Zhimin Wu, Ernst Moritz Hahn, Akin Gunay, European Conference on Artificial Intelligence . 2016

机译：GPU加速的MDP中可达性概率的价值迭代
5. A GPU-accelerated high-order multiphase computational tool for asteroid fragmentation and pulverization modeling. [D] . Zimmerman, Ben J. 2016

机译：GPU加速的高阶多相计算工具，用于小行星破碎和粉化建模。
6. PrIC3: Property Directed Reachability for MDPs [O] . Kevin Batz, Sebastian Junges, Benjamin Lucien Kaminski, -1

机译：PRIC3：MDP的属性定向可达性
7. Reachability in MDPs: Refining Convergence of Value Iteration [O] . Haddad, Serge, Monmege, Benjamin 2014

机译：MDP中的可到达性：完善价值迭代的融合

GPU-Accelerated Value Iteration for the Computation of Reachability Probabilities in MDPs

摘要

著录项

相似文献

相关主题

期刊订阅