首页> 外文会议>European Conference on Artificial Intelligence;Conference on Prestigious Applications of Intelligent Systems >GPU-Accelerated Value Iteration for the Computation of Reachability Probabilities in MDPs
【24h】

GPU-Accelerated Value Iteration for the Computation of Reachability Probabilities in MDPs

机译:GPU加速的MDP中可达性概率的价值迭代

获取原文

摘要

Computation of reachability probabilities is an important subroutine for determining approximately optimal policies of MDPs [3]. Value iteration (VI) [2] is a well-known method to compute these values. However, sequential implementation of VI is computationally expensive both in terms of time and memory. Hence, we propose a highly parallel version of VI to solve general MDPs utilizing the GPU, which is widely used in the recent years to accelerate execution performance of various computational methods in many areas [1, 7, 6]. Our approach explores algebraic features (e.g., matrix structure) of MDPs, and uses action-based matrices to achieve massive parallelism for efficiency. We empirically evaluate our approach on several case studies. Our results show that we can achieve up to 10X~ speedup compared to sequential VI, and outperform topological value iteration (TVI) [4] in most of the cases. Particularly, for MDPs which do not contain strongly connected components (SCCs) with more than one state, or which contain a small number of large SCCs, our approach achieves up to 17X speedup compared to TVI.
机译:可达性概率的计算是确定MDPS近似最佳策略的重要子程序[3]。值迭代(VI)[2]是一个众所周知的方法来计算这些值。然而,在时间和内存方面,VI的连续实现是计算昂贵的。因此,我们提出了一种高度平行的VI版本,以利用GPU解决一般MDP,该GPU在近年来广泛使用,以加速各种计算方法在许多领域的执行性能[1,7,6]。我们的方法探讨了MDP的代数特征(例如,矩阵结构),并使用基于动作的矩阵来实现效率的大规模行度。我们在若干案例研究中凭经验评估了我们的方法。我们的结果表明,与顺序VI相比,我们可以实现高达10x〜加速,并且在大多数情况下优于拓扑价值迭代(TVI)[4]。特别是,对于不包含多于一个状态的强大连接的组件(SCC)的MDP,或者包含少量大型SCC,我们的方法与TVI相比,我们的方法可以实现高达17倍的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号