首页> 外文会议>International Conference on Automated Planning and Scheduling(ICAPS 2007); 2007; >Gradient-Based Relational Reinforcement Learning of Temporally Extended Policies
【24h】

Gradient-Based Relational Reinforcement Learning of Temporally Extended Policies

机译:临时扩展策略的基于梯度的关系强化学习

获取原文
获取原文并翻译 | 示例

摘要

We consider the problem of computing general policies for decision-theoretic planning problems with temporally extended rewards. We describe a gradient-based approach to relational reinforcement learning (RRL) of policies for that setting. In particular, the learner optimises its behaviour by acting in a set of problems drawn from a target domain. Our approach is similar to inductive policy selection because the policies learnt are given in terms of relational control-rules. These rules are generated either (1) by reasoning from a first-order specification of the domain, or (2) more or less arbitrarily according to a taxonomic concept language. To this end the paper contributes a domain definition language for problems with temporally extended rewards, and a taxonomic concept language in which concepts and relations can be temporal. We evaluate our approach in versions of the miconic, logistics and blocks-world planning benchmarks and find that it is able to learn good policies. Our experiments show there is a significant advantage in making temporal concepts available in RRL for planning, whether rewards are temporally extended or not.
机译:我们考虑为具有时间扩展奖励的决策理论计划问题计算通用策略的问题。我们描述了一种基于梯度的方法来针对该设置进行策略的关系强化学习(RRL)。尤其是,学习者通过应对目标领域中的一系列问题来优化其行为。我们的方法类似于归纳策略选择,因为所学习的策略是根据关系控制规则给出的。这些规则是(1)通过对域的一阶规范进行推理而生成的,或者(2)根据分类概念语言或多或少地任意生成的。为此,本文提供了一种用于领域解决方案的语言,用于解决具有时间扩展奖励的问题,并提供了一种分类概念语言,其中概念和关系可以是临时的。我们在miconic,后勤和区块世界规划基准的版本中评估了我们的方法,发现它能够学习良好的政策。我们的实验表明,无论奖励是否在时间上得到扩展,使RRL中的时间概念都可用于计划具有很大的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号