Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets

机译：利用记忆选项和选项观察启动集中的POMDPS中的加固学习

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options), we show that addressing both problems simultaneously is simpler and more efficient in many cases. More specifically, we make the initiation set of options conditional on the previously-executed option, and show that options with such Option-Observation Initiation Sets (OOIs) are at least as expressive as Finite State Controllers (FSCs), a state-of-the-art approach for learning in POMDPs. OOIs are easy to design based on an intuitive description of the task, lead to explainable policies and keep the top-level and option policies memoryless. Our experiments show that OOIs allow agents to learn optimal policies in challenging POMDPs, while being much more sample-efficient than a recurrent neural network over options.

机译：许多真实的加固学习问题具有等级性质，并且通常表现出一定程度的部分可观察性。虽然通常单独解决层次结构和部分可观察性（例如通过组合经常性神经网络和选项），我们显示在许多情况下同时解决这两个问题更简单，更有效。更具体地说，我们在先前执行的选项上使Ontivition Opition Opitional条件设置，并显示具有此类选项观察启动集（OOIS）的选项至少与有限状态控制器（FSC）一样表达，其 POMDPS学习的最新方法。 oOIS易于根据任务的直观描述设计，导致可解释的策略并保持最高级别和期权策略记忆。我们的实验表明，OOIS允许代理商在挑战POMDPS上学习最佳政策，同时比复发性神经网络在选项上更具样本效率。

著录项

来源
《AAAI Conference on Artificial Intelligence;Innovative Applications of Artificial Intelligence Conference;Symposium on Educational Advances in Artificial Intelligence》|2018年|3604-4628p|共8页
会议地点
作者
Denis Steckelmacher; Diederik M. Roijers; Anna Harutyunyan; Peter Vrancx; Helene Plisnier; Ann Nowe;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs [J] . Finale Doshi-Velez, Joelle Pineau, Nicholas Roy Artificial intelligence . 2012,第期

机译：通过有限的强化进行强化学习：使用贝叶斯风险在POMDP中进行主动学习
2. Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration With Application to Autonomous Sequential Repair Problems [J] . Bhattacharya Sushmita, Badyal Sahil, Wheeler Thomas, IEEE Robotics and Automation Letters . 2020,第3期

机译：POMDP的加固学习：分区推出和策略迭代，应用于自主顺序修复问题
3. Deep Variational Reinforcement Learning for POMDPs [J] . Maximilian Igl, Luisa Zintgraf, Tuan Anh Le, JMLR: Workshop and Conference Proceedings . 2018,第4期

机译：POMDP的深度变式强化学习
4. Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets [C] . Denis Steckelmacher, Diederik M. Roijers, Anna Harutyunyan, AAAI Conference on Artificial Intelligence;Innovative Applications of Artificial Intelligence Conference;Symposium on Educational Advances in Artificial Intelligence . 2018

机译：利用记忆选项和选项观察启动集中的POMDPS中的加固学习
5. Delta Hedging of Financial Options Using Reinforcement Learning and an Impossibility Hypothesis [D] . Tali, Ronak. 2020

机译：三角洲使用加强学习和不可能的假设对冲金融选择
6. Reinforcement Learning with Limited Reinforcement: Using Bayes Risk for Active Learning in POMDPs [O] . Finale Doshi, Joelle Pineau, Nicholas Roy -1

机译：通过有限的强化进行强化学习：使用Bayes风险在POMDP中进行主动学习
7. Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs [O] . Doshi-Velez Finale, Pineau Joelle, Roy Nicholas 2012

机译：通过有限的强化进行强化学习：使用贝叶斯风险在POMDP中进行主动学习

Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅