Learning Options for an MDP from Demonstrations

机译：从演示中学习MDP的选项

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The options framework provides a foundation to use hierarchical actions in reinforcement learning. An agent using options, along with primitive actions, at any point in time can decide to perform a macro-action made out of many primitive actions rather than a primitive action. Such macro-actions can be hand-crafted or learned. There has been previous work on learning them by exploring the environment. Here we take a different perspective and present an approach to learn options from a set of experts demonstrations. Empirical results are also presented in a similar setting to the one used in other works in this area.

机译：选项框架为在强化学习中使用分层操作提供了基础。使用选项的代理以及原始操作可以在任何时间点都可以决定执行由许多原始操作而不是原始操作的宏操作。这种宏动作可以手工制作或学习。以前通过探索环境来学习它们。在这里，我们采取了不同的视角，并提出了一种从一组专家演示中学习选项的方法。经验结果也呈现在该区域其他作品中使用的类似环境中。

著录项

来源
《Australasian Conference on Artificial Life and Computational Intelligence》|2015年||共17页
会议地点
作者
Marco Tamassia; Fabio Zambetta; William Raffe; Xiaodong Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Reinforcement learning; Options;

机译：强化学习;选择;

相似文献

外文文献
中文文献
专利

1. Exploration-Exploitation in MDPs with Options [J] . Ronan Fruit, Alessandro Lazaric JMLR: Workshop and Conference Proceedings . 2017,第3期

机译：带有选项的MDP中的勘探开发
2. DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations [J] . Sanjay Krishnan, Roy Fox, Ion Stoica, JMLR: Workshop and Conference Proceedings . 2017,第1期

机译：DDCO：通过演示发现用于机器人学习的深层连续选项
3. DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations [J] . Sanjay Krishnan, Roy Fox, Ion Stoica, JMLR: Workshop and Conference Proceedings . 2017,第6期

机译：DDCO：通过演示发现用于机器人学习的深层连续选项
4. Learning Options for an MDP from Demonstrations [C] . Marco Tamassia, Fabio Zambetta, William Raffe, Australasian conference on artificial life and computational intelligence . 2015

机译：演示中的MDP学习选项
5. Selection bias and utilization in the Minnesota Senior Health Options demonstration. [D] . Zhang, Hui. 2006

机译：明尼苏达州“高级健康选择”演示中的选择偏见和利用。
6. Reinforcement Learning with Limited Reinforcement: Using Bayes Risk for Active Learning in POMDPs [O] . Finale Doshi, Joelle Pineau, Nicholas Roy -1

机译：通过有限的强化进行强化学习：使用Bayes风险在POMDP中进行主动学习
7. Learning from demonstration using mdp induced metrics [O] . Francisco S. Melo, Manuel Lopes 2010

机译：使用mdp诱导指标从演示中学习

Learning Options for an MDP from Demonstrations

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅