【24h】

FF+FPG: Guiding a Policy-Gradient Planner

机译:FF + FPG:指导政策梯度计划者

获取原文
获取原文并翻译 | 示例

摘要

The Factored Policy-Gradient planner (FPG) (Buffet & Aberdeen 2006) was a successful competitor in the probabilistic track of the 2006 International Planning Competition (IPC). FPG is innovative because it scales to large planning domains through the use of Reinforcement Learning. It essentially performs a stochastic local search in policy space. FPG's weakness is potentially long learning times, as it initially acts randomly and progressively improves its policy each time the goal is reached. This paper shows how to use an external teacher to guide FPG's exploration. While any teacher can be used, we concentrate on the actions suggested by FF's heuristic (Hoffmann 2001), as FF-replan has proved efficient for probabilistic re-planning. To achieve this, FPG must learn its own policy while following another. We thus extend FPG to off-policy learning using importance sampling (Glynn & Iglehart 1989; Peshkin & Shelton 2002). The resulting algorithm is presented and evaluated on IPC benchmarks.
机译:因果政策梯度计划器(FPG)(Buffet和Aberdeen,2006年)是2006年国际计划竞赛(IPC)概率轨道上的成功竞争者。 FPG具有创新性,因为它可以通过使用强化学习扩展到大型计划领域。它实质上在策略空间中执行随机本地搜索。 FPG的弱点可能是学习时间长,因为它最初会随机采取行动,并在每次达到目标时逐步改善其政策。本文展示了如何使用外部老师来指导FPG的探索。尽管可以使用任何老师,但我们都专注于FF启发式算法建议的操作(Hoffmann,2001年),因为事实证明FF重新计划对于概率重新计划非常有效。为了实现这一目标,FPG必须在遵循另一个政策的同时学习自己的政策。因此,我们将FPG扩展到使用重要性抽样的非政策学习中(Glynn&Iglehart 1989; Peshkin&Shelton 2002)。提出的算法将在IPC基准上进行评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号