Approximate Policy Iteration with a Policy Language Bias

机译：具有策略语言偏差的近似策略迭代

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We explore approximate policy iteration, replacing the usual cost-function learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs.

机译：我们探索近似的策略迭代，用策略空间中的学习步骤代替通常的成本函数学习步骤。我们给出了政策语言上的偏见，使人们能够解决以前的技术无法解决的非常大的关系马尔可夫决策过程（MDP）。特别是，我们通过解决诸如超大型MDP之类的领域，为经典规划领域（确定性和随机变体）引入了针对特定领域的高质量规划者。

著录项

来源
《Annual Conference on Neural Information Processing Systems(NIPS); 20031208-13; British Columbia(CA)》|2003年|P.847-854|共8页
会议地点 British Columbia(CA)
作者
Alan Fern; SungWook Yoon; Robert Givan;
展开▼
作者单位

Electrical and Computer Engineering, Purdue University, W. Lafayette, IN 47907;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类信息处理（信息加工）;
关键词

相似文献

外文文献
中文文献
专利

1. Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes [J] . Fern A., Givan R., Yoon S. The Journal of Artificial Intelligence Research . 2006,第12期

机译：具有策略语言偏差的近似策略迭代：解决关系马尔可夫决策过程
2. Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes [J] . A. Fern S. Yoon, R. Givan Journal of Automation, Mobile Robotics & Intelligent Systems . 2006,第5期

机译：具有策略语言偏差的近似策略迭代：解决关系马尔可夫决策过程
3. Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes [J] . Alan Fern, Sungwook Yoon, Robert Givan The Journal of Artificial Intelligence Research . 2006,第0期

机译：具有策略语言偏差的近似策略迭代：解决关系Markov决策过程
4. Approximate Policy Iteration with a Policy Language Bias [C] . Alan Fern, SungWook Yoon, Robert Givan Annual Conference on Neural Information Processing Systems . 2004

机译：具有策略语言偏见的近似政策迭代
5. Energy Storage Applications of the Knowledge Gradient for Calibrating Continuous Parameters, Approximate Policy Iteration using Bellman Error Minimization with Instrumental Variables, and Covariance Matrix Estimation using an Errors-in-Variables Factor Model. [D] . Scott, Warren Robert. 2012

机译：知识梯度的能量存储应用，用于校准连续参数，使用带工具变量的Bellman误差最小化进行近似策略迭代以及使用可变误差因子模型进行协方差矩阵估计。
6. To what extent are Canadian second language policies evidence-based? Reflections on the intersections of research and policy [O] . Jim Cummins 2014

机译：加拿大第二语言政策在多大程度上基于证据？关于研究与政策交叉的思考
7. Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes [O] . Fern, A., Givan, R., Yoon, S. 2011

机译：使用策略语言偏差进行近似策略迭代：求解关系马尔可夫决策过程

Approximate Policy Iteration with a Policy Language Bias

摘要

著录项

相似文献

相关主题

期刊订阅