FF+FPG: Guiding a Policy-Gradient Planner

机译：FF + FPG：指导政策梯度计划者

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Factored Policy-Gradient planner (FPG) (Buffet & Aberdeen 2006) was a successful competitor in the probabilistic track of the 2006 International Planning Competition (IPC). FPG is innovative because it scales to large planning domains through the use of Reinforcement Learning. It essentially performs a stochastic local search in policy space. FPG's weakness is potentially long learning times, as it initially acts randomly and progressively improves its policy each time the goal is reached. This paper shows how to use an external teacher to guide FPG's exploration. While any teacher can be used, we concentrate on the actions suggested by FF's heuristic (Hoffmann 2001), as FF-replan has proved efficient for probabilistic re-planning. To achieve this, FPG must learn its own policy while following another. We thus extend FPG to off-policy learning using importance sampling (Glynn & Iglehart 1989; Peshkin & Shelton 2002). The resulting algorithm is presented and evaluated on IPC benchmarks.

机译：因果政策梯度计划器（FPG）（Buffet和Aberdeen，2006年）是2006年国际计划竞赛（IPC）概率轨道上的成功竞争者。 FPG具有创新性，因为它可以通过使用强化学习扩展到大型计划领域。它实质上在策略空间中执行随机本地搜索。 FPG的弱点可能是学习时间长，因为它最初会随机采取行动，并在每次达到目标时逐步改善其政策。本文展示了如何使用外部老师来指导FPG的探索。尽管可以使用任何老师，但我们都专注于FF启发式算法建议的操作（Hoffmann，2001年），因为事实证明FF重新计划对于概率重新计划非常有效。为了实现这一目标，FPG必须在遵循另一个政策的同时学习自己的政策。因此，我们将FPG扩展到使用重要性抽样的非政策学习中（Glynn＆Iglehart 1989; Peshkin＆Shelton 2002）。提出的算法将在IPC基准上进行评估。

著录项

来源
《International Conference on Automated Planning and Scheduling(ICAPS 2007); 2007;》|2007年|P.42-48|共7页
会议地点
作者
Olivier Buffet; Douglas Aberdeen;
展开▼
作者单位

LAAS-CNRS University of Toulouse Toulouse, France;

National ICT australia The Australian National University Canberra, Australia;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 N12;
关键词
入库时间 2022-08-26 14:15:27

相似文献

外文文献
中文文献
专利

1. The Factored Policy-gradient Planner [J] . Olivier Buffet, Douglas Aberdeen Artificial intelligence . 2009,第5a6期

机译：因子分解的政策规划师
2. GART: An environment-guided path planner for robots in crowded environments under kinodynamic constraints [J] . Yang Liang, Xiao Jizhong, Qi Juntong, International Journal of Advanced Robotic Systems . 2016,第期

机译：GART：用于在Kinodynamic限制下拥挤环境中的机器人的环境导向路径规划师
3. Using LaserSight Astrapro Planner 2.2 Z software in corneal topography-guided laser in situ keratomileusis for myopia with asymmetric corneal shape [J] . Bing, Liu, Wei, 国际眼科杂志：英文版 . 2014,第003期

机译：使用LaserSight Astrapro Planner 2.2 Z软件在角膜地形图引导的激光原位角膜磨镶术治疗不对称角膜形状的近视眼中
4. FF+FPG: Guiding a Policy-Gradient Planner [C] . Olivier Buffet, Douglas Aberdeen International Conference on Automated Planning and Scheduling . 2007

机译：FF + FPG：指导政策梯度规划师
5. A planner's guide to a placemaking practice: A collaborative approach for neighbourhood management. An analysis of the Fort Rouge Neighbourhood Management Plan [D] . Mora Rivera, Marcela Patricia. 2000

机译：平地制作实践指南：邻里管理的合作方法。瑞典堡街区管理计划分析
6. Drivers’ Visual Behavior-Guided RRT Motion Planner for Autonomous On-Road Driving [O] . Mingbo Du, Tao Mei, Huawei Liang, 2016

机译：驾驶员的视觉行为指导的RRT运动计划器用于自动公路驾驶
7. The Factored Policy-Gradient Planner [O] . Buffet, Olivier, Aberdeen, Douglas 2009

机译：因子分解策略计划器

FF+FPG: Guiding a Policy-Gradient Planner

摘要

著录项

相似文献

相关主题

期刊订阅