首页> 外文期刊>JMLR: Workshop and Conference Proceedings >An Inference-Based Policy Gradient Method for Learning Options
【24h】

An Inference-Based Policy Gradient Method for Learning Options

机译:基于推理的学习选项策略梯度方法

获取原文
       

摘要

In the pursuit of increasingly intelligent learning systems, abstraction plays a vital role in enabling sophisticated decisions to be made in complex environments. The options framework provides formalism for such abstraction over sequences of decisions. However most models require that options be given a priori, presumably specified by hand, which is neither efficient, nor scalable. Indeed, it is preferable to learn options directly from interaction with the environment. Despite several efforts, this remains a difficult problem. In this work we develop a novel policy gradient method for the automatic learning of policies with options. This algorithm uses inference methods to simultaneously improve all of the options available to an agent, and thus can be employed in an off-policy manner, without observing option labels. The differentiable inference procedure employed yields options that can be easily interpreted. Empirical results confirm these attributes, and indicate that our algorithm has an improved sample efficiency relative to state-of-the-art in learning options end-to-end.
机译:在追求越来越智能的学习系统时,抽象在使复杂的环境中做出复杂的决定中起着至关重要的作用。选项框架为决策序列的这种抽象提供形式主义。但是,大多数模型都要求给选项提供先验,大概是手工指定的,这既没有效率,也没有可扩展性。实际上,最好直接从与环境的交互中学习选择。尽管付出了许多努力,但这仍然是一个难题。在这项工作中,我们开发了一种新颖的策略梯度方法,用于带有选项的策略的自动学习。该算法使用推理方法来同时改善代理可用的所有选项,因此可以在不遵守选项标签的情况下以非策略方式使用。所采用的可微分推理程序会产生易于解释的选项。实验结果证实了这些属性,并表明我们的算法相对于端到端学习选项中的最新技术,具有更高的样本效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号