Simple Model-Based Exploration and Exploitation of Markov Decision Processes Using the Elimination Algorithm

机译：基于消除算法的基于模型的马尔可夫决策过程的简单探索和开发

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The fundamental problem in learning and planning of Markov Decision Processes is how the agent explores and exploits an uncertain environment. The classical solutions to the problem are basically heuristics that lack appropriate theoretical justifications. As a result, principled solutions based on Bayesian estimation, though intractable even in small cases, have been recently investigated. The common approach is to approximate Bayesian estimation with sophisticated methods that cope the intractability of computing the Bayesian posterior. However, we notice that the complexity of these approximations still prevents their use as the long-term reward gain improvement seems to be diminished by the difficulties of implementation. In this work, we propose a deliberately simplistic model-based algorithm to show the benefits of Bayesian estimation when compared to classical model-free solutions. In particular, our agent combines several Markov Chains from its belief state and uses the matrix-based Elimination Algorithm to find the best action to take. We test our agent over the three standard problems Chain, Loop, and Maze, and find that it outperforms the classical Q-Learning with e-Greedy, Boltzmann, and Interval Estimation action selection heuristics.

机译：马尔可夫决策过程的学习和规划中的根本问题是主体如何探索和利用不确定的环境。对该问题的经典解决方案基本上是启发式方法，缺乏适当的理论依据。结果，最近研究了基于贝叶斯估计的有原则的解决方案，尽管即使在很小的情况下也难以解决。通用方法是使用复杂的方法来近似贝叶斯估计，以应对计算贝叶斯后验的难处理性。但是，我们注意到这些近似方法的复杂性仍然阻止了它们的使用，因为长期奖励收益的改善似乎由于实施的困难而减少了。在这项工作中，我们提出了一种基于模型的故意简化算法，以显示与经典的无模型解决方案相比贝叶斯估计的好处。特别是，我们的代理从其置信状态结合了多个马尔可夫链，并使用基于矩阵的消除算法来找到最佳的采取措施。我们在三个标准问题Chain，Loop和Maze上测试了我们的代理，发现它在e-Greedy，Boltzmann和Interval Estimation动作选择启发式方法方面优于经典的Q学习。

著录项

来源
《Mexican International Conference on Artificial Intelligence(MICAI 2007); 20071104-10; Aguascalientes(MX)》|2007年|P.327-336|共10页
会议地点 Aguascalientes(MX)
作者
Elizabeth Novoa;
展开▼
作者单位

Departamento de Ingenieria Informatica, Universidad de Santiago de Chile, Av. Ecuador 3659, Santiago, Chile;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Simplex Algorithm for Countable-State Discounted Markov Decision Processes [J] . Lee Ilbin, Epelman Marina A., Romeijn H. Edwin, Operations Research: The Journal of the Operations Research Society of America . 2017,第4期

机译：单纯x可数状态折扣马尔可夫决策过程的算法
2. Lift-off fellowship report:Using interior point algorithms to solve theHamiltonian cycle problem by exploiting aMarkov decision process embedding [J] . Michael Haythorpe Gazette: the australian mathematical society . 2011,第4期

机译：升空研究金报告：利用内部点算法通过利用马尔可夫决策过程嵌入来解决哈密顿循环问题
3. An Analysis Of Model-based Interval Estimation For Markov Decision Processes [J] . Alexander L. Strehl, Michael L. Littman Journal of computer and system sciences . 2008,第8期

机译：基于模型的马尔可夫决策过程的区间估计分析
4. Simple Model-Based Exploration and Exploitation of Markov Decision Processes Using the Elimination Algorithm [C] . Elizabeth Novoa Mexican International Conference on Artificial Intelligence . 2007

机译：基于模型的基于模型的Markov决策过程利用消除算法
5. A New Reinforcement Learning Algorithm with Fixed Exploration for Semi-Markov Decision Processes [D] . Encapera, Angelo Michael. 2017

机译：半马尔可夫决策过程的固定探索新强化学习算法
6. Hidden Markov Model-Based CNV Detection Algorithms for Illumina Genotyping Microarrays [O] . Eric L Seiser, Federico Innocenti 2014

机译：基于隐马尔可夫模型的Illumina基因分型微阵列CNV检测算法
7. Simplex Algorithm for Countable-state Discounted Markov Decision Processes [O] . Lee Ilbin, Epelman Marina A., Romeijn H. Edwin, 2014

机译：可数状态折扣马尔可夫决策过程的单纯形算法

Simple Model-Based Exploration and Exploitation of Markov Decision Processes Using the Elimination Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅