Learning Representation and Control in Markov Decision Processes.

机译：马尔可夫决策过程中的学习表示与控制。

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This research investigated algorithms for approximately solving Markov decision processes (MDPs), a widely used model of sequential decision making. Much past work on solving MDPs in adaptive dynamic programming and reinforcement learning has assumed representations, such as basis functions, are provided by a human expert. The research investigated a variety of approaches to automatic basis construction, including reward-sensitive and reward-invariant methods, diagonalization and dilation methods, as well as orthogonal and over-complete representations. A unifying perspective on the various basis construction methods emerges from showing they result from different power series expansions of value functions, including the Neumann series expansion, the Laurent series expansion, and the Schultz expansion. The research also develops new computational algorithms for learning sparse solutions to MDPs using convex optimization methods.

著录项

作者
Mahadevan, S.;
展开▼
作者单位

展开▼
年度 2013
页码 1-34
总页数 34
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Algorithms; Decision making; Markov processes; Problem solving; Air force research; Convex bodies; Dynamic programming; Learning; Methodology; Optimization; Series(Mathematics);

机译：算法;决策;马尔可夫过程;问题解决;空军研究;凸体;动态规划;学习;方法论;优化;系列（数学）;

相似文献

外文文献
中文文献
专利

1. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes [J] . Mahadevan Sridhar, Maggioni Mauro Journal of machine learning research . 2007,第Oct期

机译：原型功能：一个拉普拉斯框架，用于学习马尔可夫决策过程中的表示和控制
2. Controllable Markov Jump Processes. Ⅱ. Monitoring and Optimization of TCP Connections [J] . Borisov A. V., Miller G. B., Stefanovich A. I. Journal of Computer and Systems Sciences International . 2019,第1期

机译：可控的马尔可夫跳跃过程。 Ⅱ。监视和优化TCP连接
3. Controllable Markov Jump Processes. Ⅱ. Monitoring and Optimization of TCP Connections [J] . Borisov A. V., Miller G. B., Stefanovich A. I. Journal of Computer and Systems Sciences International . 2019,第1期

机译：可控马尔可夫跳跃过程。 Ⅱ。监控和优化TCP连接
4. Learning Representation and Control In Continuous Markov Decision Processes [C] . Sridhar Mahadevan, Mauro Maggioni, Kimberly Ferguson, National Conference on Artificial Intelligence(AAAI-06);Innovative Applications of Artificial Intelligence Conference(IAAI-06) . 2006

机译：连续马尔可夫决策过程中的学习表示与控制
5. Model learning and application of partially observable Markov decision processes. [D] . He, Lihan. 2008

机译：部分可观察的马尔可夫决策过程的模型学习和应用。
6. Modeling treatment of ischemic heart disease with partially observable Markov decision processes. [O] . M. Hauskrecht, H. Fraser 1998

机译：使用局部可观察的马尔可夫决策过程对缺血性心脏病的治疗进行建模。
7. Learning representation and control in markov decision processes: New frontiers [O] . Sridhar Mahadevan 2014

机译：在Markov决策过程中学习表示和控制：新领域

Learning Representation and Control in Markov Decision Processes.

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅