首页> 外文会议>Mexican International Conference on Artificial Intelligence(MICAI 2007); 20071104-10; Aguascalientes(MX) >Simple Model-Based Exploration and Exploitation of Markov Decision Processes Using the Elimination Algorithm
【24h】

Simple Model-Based Exploration and Exploitation of Markov Decision Processes Using the Elimination Algorithm

机译:基于消除算法的基于模型的马尔可夫决策过程的简单探索和开发

获取原文
获取原文并翻译 | 示例

摘要

The fundamental problem in learning and planning of Markov Decision Processes is how the agent explores and exploits an uncertain environment. The classical solutions to the problem are basically heuristics that lack appropriate theoretical justifications. As a result, principled solutions based on Bayesian estimation, though intractable even in small cases, have been recently investigated. The common approach is to approximate Bayesian estimation with sophisticated methods that cope the intractability of computing the Bayesian posterior. However, we notice that the complexity of these approximations still prevents their use as the long-term reward gain improvement seems to be diminished by the difficulties of implementation. In this work, we propose a deliberately simplistic model-based algorithm to show the benefits of Bayesian estimation when compared to classical model-free solutions. In particular, our agent combines several Markov Chains from its belief state and uses the matrix-based Elimination Algorithm to find the best action to take. We test our agent over the three standard problems Chain, Loop, and Maze, and find that it outperforms the classical Q-Learning with e-Greedy, Boltzmann, and Interval Estimation action selection heuristics.
机译:马尔可夫决策过程的学习和规划中的根本问题是主体如何探索和利用不确定的环境。对该问题的经典解决方案基本上是启发式方法,缺乏适当的理论依据。结果,最近研究了基于贝叶斯估计的有原则的解决方案,尽管即使在很小的情况下也难以解决。通用方法是使用复杂的方法来近似贝叶斯估计,以应对计算贝叶斯后验的难处理性。但是,我们注意到这些近似方法的复杂性仍然阻止了它们的使用,因为长期奖励收益的改善似乎由于实施的困难而减少了。在这项工作中,我们提出了一种基于模型的故意简化算法,以显示与经典的无模型解决方案相比贝叶斯估计的好处。特别是,我们的代理从其置信状态结合了多个马尔可夫链,并使用基于矩阵的消除算法来找到最佳的采取措施。我们在三个标准问题Chain,Loop和Maze上测试了我们的代理,发现它在e-Greedy,Boltzmann和Interval Estimation动作选择启发式方法方面优于经典的Q学习。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号