Learning from Demonstrations: Is It Worth Estimating a Reward Function?

机译：从示范中学习：值得评估奖励功能吗？

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper provides a comparative study between Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL). IRL and AL are two frameworks, using Markov Decision Processes (MDP), which are used for the imitation learning problem where an agent tries to learn from demonstrations of an expert. In the AL framework, the agent tries to learn the expert policy whereas in the IRL framework, the agent tries to learn a reward which can explain the behavior of the expert. This reward is then optimized to imitate the expert. One can wonder if it is worth estimating such a reward, or if estimating a policy is sufficient. This quite natural question has not really been addressed in the literature right now. We provide partial answers, both from a theoretical and empirical point of view.

机译：本文提供了反强化学习（IRL）和学徒学习（AL）之间的比较研究。 IRL和AL是使用Markov决策过程（MDP）的两个框架，用于解决代理人试图从专家的演示中学习的模仿学习问题。在AL框架中，代理尝试学习专家策略，而在IRL框架中，代理尝试学习可以解释专家行为的奖励。然后优化此奖励以模仿专家。有人会怀疑是否值得估计这样的奖励，或者估计一项政策是否足够。目前，文献中还没有真正解决这个非常自然的问题。从理论和经验的角度来看，我们都提供部分答案。

著录项

来源
《European conference on machine learning and knowledge discovery in databases》|2013年|17-32|共16页
会议地点
作者
Bilal Piot; Matthieu Geist; Olivier Pietquin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Subjective and model-estimated reward prediction: association with the feedback-related negativity (FRN) and reward prediction error in a reinforcement learning task. [J] . Ichikawa N, Siegle GJ, Dombrovski A, International journal of psychophysiology: official journal of the International Organization of Psychophysiology . 2010,第3期

机译：主观和模型估计的奖励预测：与强化学习任务中与反馈相关的负性（FRN）和奖励预测错误相关联。
2. Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards [J] . Guoyu Zuo, Qishen Zhao, Jiahao Lu, International Journal of Advanced Robotic Systems . 2020,第1期

机译：使用具有稀疏奖励的机器人任务的演示高效的后敏感钢筋学习
3. Learning motions from demonstrations and rewards with time-invariant dynamical systems based policies [J] . Rey Joel, Kronander Klas, Farshidian Farbod, Autonomous robots . 2018,第1期

机译：与基于时间不变的动态系统的演示和奖励的学习动作
4. Learning from Demonstrations: Is It Worth Estimating a Reward Function? [C] . Bilal Piot, Matthieu Geist, Olivier Pietquin European conference on machine learning and knowledge discovery in databases . 2013

机译：从示威中学习：值得估算奖励功能吗？
5. Utilizing Context and Structure of Reward Functions to Improve Online Learning in Wireless Networks [D] . Sakulkar, Pranav Krishna. 2018

机译：利用奖励函数的上下文和结构来改善无线网络中的在线学习
6. Subjective and model-estimated reward prediction: Association with the feedback-related negativity (FRN) and reward prediction error in a reinforcement learning task [O] . Naho Ichikawa, Greg J. Siegle, Alexandre Y. Dombrovski, -1

机译：主观和模型估计奖励预测：与反馈相关的消极性（FRN）关联并在加固学习任务中奖励预测误差
7. Learning from Demonstrations: Is It Worth Estimating a Reward Function? [O] . Bilal Piot, Matthieu Geist, Olivier Pietquin 2015

机译：从示威中学习：是否值得估算奖励功能？

Learning from Demonstrations: Is It Worth Estimating a Reward Function?

摘要

著录项

相似文献

相关主题

期刊订阅