【24h】

Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets

机译:利用记忆选项和选项观察启动集中的POMDPS中的加固学习

获取原文

摘要

Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options), we show that addressing both problems simultaneously is simpler and more efficient in many cases. More specifically, we make the initiation set of options conditional on the previously-executed option, and show that options with such Option-Observation Initiation Sets (OOIs) are at least as expressive as Finite State Controllers (FSCs), a state-of-the-art approach for learning in POMDPs. OOIs are easy to design based on an intuitive description of the task, lead to explainable policies and keep the top-level and option policies memoryless. Our experiments show that OOIs allow agents to learn optimal policies in challenging POMDPs, while being much more sample-efficient than a recurrent neural network over options.
机译:许多真实的加固学习问题具有等级性质,并且通常表现出一定程度的部分可观察性。 虽然通常单独解决层次结构和部分可观察性(例如通过组合经常性神经网络和选项),我们显示在许多情况下同时解决这两个问题更简单,更有效。 更具体地说,我们在先前执行的选项上使Ontivition Opition Opitional条件设置,并显示具有此类选项观察启动集(OOIS)的选项至少与有限状态控制器(FSC)一样表达,其 POMDPS学习的最新方法。 oOIS易于根据任务的直观描述设计,导致可解释的策略并保持最高级别和期权策略记忆。 我们的实验表明,OOIS允许代理商在挑战POMDPS上学习最佳政策,同时比复发性神经网络在选项上更具样本效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号