An Inference-Based Policy Gradient Method for Learning Options

Matthew Smith; Herke Hoof; Joelle Pineau

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >An Inference-Based Policy Gradient Method for Learning Options

【24h】

An Inference-Based Policy Gradient Method for Learning Options

机译：基于推理的学习选项策略梯度方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the pursuit of increasingly intelligent learning systems, abstraction plays a vital role in enabling sophisticated decisions to be made in complex environments. The options framework provides formalism for such abstraction over sequences of decisions. However most models require that options be given a priori, presumably specified by hand, which is neither efficient, nor scalable. Indeed, it is preferable to learn options directly from interaction with the environment. Despite several efforts, this remains a difficult problem. In this work we develop a novel policy gradient method for the automatic learning of policies with options. This algorithm uses inference methods to simultaneously improve all of the options available to an agent, and thus can be employed in an off-policy manner, without observing option labels. The differentiable inference procedure employed yields options that can be easily interpreted. Empirical results confirm these attributes, and indicate that our algorithm has an improved sample efficiency relative to state-of-the-art in learning options end-to-end.

机译：在追求越来越智能的学习系统时，抽象在使复杂的环境中做出复杂的决定中起着至关重要的作用。选项框架为决策序列的这种抽象提供形式主义。但是，大多数模型都要求给选项提供先验，大概是手工指定的，这既没有效率，也没有可扩展性。实际上，最好直接从与环境的交互中学习选择。尽管付出了许多努力，但这仍然是一个难题。在这项工作中，我们开发了一种新颖的策略梯度方法，用于带有选项的策略的自动学习。该算法使用推理方法来同时改善代理可用的所有选项，因此可以在不遵守选项标签的情况下以非策略方式使用。所采用的可微分推理程序会产生易于解释的选项。实验结果证实了这些属性，并表明我们的算法相对于端到端学习选项中的最新技术，具有更高的样本效率。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2010期|共10页
作者
Matthew Smith; Herke Hoof; Joelle Pineau;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词
入库时间 2022-08-18 15:56:26

相似文献

外文文献
中文文献
专利

1. PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning [J] . Li Shilei, Li Meng, Su Jiongming, ACM transactions on intelligent systems and technology . 2021,第3期

机译：PP-PG：将参数扰动与政策梯度方法相结合，为深加固学习中有效和高效的探索
2. Comparing Policy Gradient and Value Function Based Reinforcement Learning Methods in Simulated Electrical Power Trade [J] . Lincoln R., Galloway S., Stephen B., Power Systems, IEEE Transactions on . 2012,第1期

机译：模拟电力贸易中基于策略梯度和价值函数的强化学习方法比较
3. Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail [J] . Eleni Vasilaki, Nicolas Frémaux, Robert Urbanczik, PLoS Computational Biology . 2009,第12期

机译：连续状态和动作空间中基于峰值的强化学习：当策略梯度方法失败时
4. Unified Inter and Intra Options Learning Using Policy Gradient Methods [C] . Kfir Y. Levy, Nahum Shimkin European Workshop on Reinforcement Learning . 2012

机译：使用策略梯度方法统一间和内部选项学习
5. Policy-Aware Model Learning for Policy Gradient Methods [D] . Abachi, Romina . 2020

机译：政策感知模型学习策略梯度方法
6. Correction: Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail [O] . Eleni Vasilaki, Nicolas Frémaux, Robert Urbanczik, 2009

机译：更正：在连续状态和动作空间中基于峰值的强化学习：当策略梯度方法失败时
7. Unified Inter and Intra Options Learning Using Policy Gradient Methods [O] . Kfir Y. Levy, Nahum Shimkin 2012

机译：使用策略梯度方法学习统一的内部和内部选项

An Inference-Based Policy Gradient Method for Learning Options

摘要

著录项

相似文献

相关主题

期刊订阅