首页> 外文会议>International Conference on Automated Planning and Scheduling(ICAPS 2007); 2007; >Gradient-Based Relational Reinforcement Learning of Temporally Extended Policies

Gradient-Based Relational Reinforcement Learning of Temporally Extended Policies


获取原文并翻译 | 示例


We consider the problem of computing general policies for decision-theoretic planning problems with temporally extended rewards. We describe a gradient-based approach to relational reinforcement learning (RRL) of policies for that setting. In particular, the learner optimises its behaviour by acting in a set of problems drawn from a target domain. Our approach is similar to inductive policy selection because the policies learnt are given in terms of relational control-rules. These rules are generated either (1) by reasoning from a first-order specification of the domain, or (2) more or less arbitrarily according to a taxonomic concept language. To this end the paper contributes a domain definition language for problems with temporally extended rewards, and a taxonomic concept language in which concepts and relations can be temporal. We evaluate our approach in versions of the miconic, logistics and blocks-world planning benchmarks and find that it is able to learn good policies. Our experiments show there is a significant advantage in making temporal concepts available in RRL for planning, whether rewards are temporally extended or not.



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号