Gradient-Based Relational Reinforcement Learning of Temporally Extended Policies

机译：临时扩展策略的基于梯度的关系强化学习

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider the problem of computing general policies for decision-theoretic planning problems with temporally extended rewards. We describe a gradient-based approach to relational reinforcement learning (RRL) of policies for that setting. In particular, the learner optimises its behaviour by acting in a set of problems drawn from a target domain. Our approach is similar to inductive policy selection because the policies learnt are given in terms of relational control-rules. These rules are generated either (1) by reasoning from a first-order specification of the domain, or (2) more or less arbitrarily according to a taxonomic concept language. To this end the paper contributes a domain definition language for problems with temporally extended rewards, and a taxonomic concept language in which concepts and relations can be temporal. We evaluate our approach in versions of the miconic, logistics and blocks-world planning benchmarks and find that it is able to learn good policies. Our experiments show there is a significant advantage in making temporal concepts available in RRL for planning, whether rewards are temporally extended or not.

机译：我们考虑为具有时间扩展奖励的决策理论计划问题计算通用策略的问题。我们描述了一种基于梯度的方法来针对该设置进行策略的关系强化学习（RRL）。尤其是，学习者通过应对目标领域中的一系列问题来优化其行为。我们的方法类似于归纳策略选择，因为所学习的策略是根据关系控制规则给出的。这些规则是（1）通过对域的一阶规范进行推理而生成的，或者（2）根据分类概念语言或多或少地任意生成的。为此，本文提供了一种用于领域解决方案的语言，用于解决具有时间扩展奖励的问题，并提供了一种分类概念语言，其中概念和关系可以是临时的。我们在miconic，后勤和区块世界规划基准的版本中评估了我们的方法，发现它能够学习良好的政策。我们的实验表明，无论奖励是否在时间上得到扩展，使RRL中的时间概念都可用于计划具有很大的优势。

著录项

来源
《International Conference on Automated Planning and Scheduling(ICAPS 2007); 2007;》|2007年|P.168-175|共8页
会议地点
作者
Charles Gretton;
展开▼
作者单位

NICTA, 300 Adelaide St, Brisbane QLD 4000, Australia;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 N12;
关键词
入库时间 2022-08-26 14:15:27

相似文献

外文文献
中文文献
专利

1. Mapless Motion Planning System for an Autonomous Underwater Vehicle Using Policy Gradient-based Deep Reinforcement Learning [J] . Sun Yushan, Cheng Junhan, Zhang Guocheng, Journal of Intelligent & Robotic Systems: Theory & Application . 2019,第3a4期

机译：基于政策梯度的深度加固学习的自主水下车辆的茂盛运动规划系统
2. Gradient-based boosting for statistical relational learning: The relational dependency network case [J] . Sriraam Natarajan, Tushar Khot, Kristian Kersting, Machine Learning . 2012,第1期

机译：统计关系学习中基于梯度的提升：关系依赖网络案例
3. Two-step gradient-based reinforcement learning for underwater robotics behavior learning [J] . Andres El-Fakdi, Marc Carreras Robotics and Autonomous Systems . 2013,第3期

机译：基于两步梯度的水下机器人行为学习强化学习
4. Gradient-Based Relational Reinforcement Learning of Temporally Extended Policies [C] . Charles Gretton International Conference on Automated Planning and Scheduling . 2007

机译：基于梯度的关系强化学习时间延长政策
5. A learning classifier system approach to relational reinforcement learning [D] . Mellor, Drew 2008

机译：关系强化学习的学习分类器系统方法
6. The Temporal Context Model in spatial navigation and relational learning: Toward a common explanation of medial temporal lobe function across domains [O] . Marc W. Howard, Mrigankka S. Fotedar, Aditya V. Datey, -1

机译：空间导航和关系学习中的时间语境模型：对跨域的颞叶功能的常见解释
7. Temporal-related Convolutional-Restricted-Boltzmann-Machine capable of learning relational order via reinforcement learning procedure? [O] . Wang, Zizhuang 2017

机译：时间相关的卷积限制Boltzmann机器能够通过强化学习程序学习关系秩序？

Gradient-Based Relational Reinforcement Learning of Temporally Extended Policies

摘要

著录项

相似文献

相关主题

期刊订阅