Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning

Bilal Piot; Matthieu Geist; Olivier Pietquin

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning

【24h】

Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning

机译：弥合模仿学习与反强化学习之间的鸿沟

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Learning from demonstrations is a paradigm by which an apprentice agent learns a control policy for a dynamic environment by observing demonstrations delivered by an expert agent. It is usually implemented as either imitation learning (IL) or inverse reinforcement learning (IRL) in the literature. On the one hand, IRL is a paradigm relying on the Markov decision processes, where the goal of the apprentice agent is to find a reward function from the expert demonstrations that could explain the expert behavior. On the other hand, IL consists in directly generalizing the expert strategy, observed in the demonstrations, to unvisited states (and it is therefore close to classification, when there is a finite set of possible decisions). While these two visions are often considered as opposite to each other, the purpose of this paper is to exhibit a formal link between these approaches from which new algorithms can be derived. We show that IL and IRL can be redefined in a way that they are equivalent, in the sense that there exists an explicit bijective operator (namely, the inverse optimal Bellman operator) between their respective spaces of solutions. To do so, we introduce the set-policy framework that creates a clear link between the IL and the IRL. As a result, the IL and IRL solutions making the best of both worlds are obtained. In addition, it is a unifying framework from which existing IL and IRL algorithms can be derived and which opens the way for the IL methods able to deal with the environment’s dynamics. Finally, the IRL algorithms derived from the set-policy framework are compared with the algorithms belonging to the more common trajectory-matching family. Experiments demonstrate that the set-policy-based algorithms outperform both the standard IRL and IL ones and result in more robust solutions.

机译：从演示中学习是一个范例，学徒代理通过观察专家代理提供的演示来学习动态环境的控制策略。在文献中，通常将其实现为模仿学习（IL）或逆强化学习（IRL）。一方面，IRL是依赖于马尔可夫决策过程的范例，学徒代理的目标是从专家演示中找到可以解释专家行为的奖励函数。另一方面，IL包括将示范中观察到的专家策略直接推广到未访问的状态（因此，在可能的决策集有限的情况下，它接近分类）。虽然这两种愿景通常被认为是彼此相反的，但本文的目的是展示这些方法之间的正式联系，从中可以得出新的算法。我们表明，IL和IRL可以用等效的方式重新定义，即它们各自的解空间之间存在一个明确的双射算子（即最优Bellman逆算子）。为此，我们引入了set-policy框架，该框架在IL和IRL之间创建了明确的链接。结果，获得了两全其美的IL和IRL解决方案。此外，它是一个统一的框架，可以从中导出现有的IL和IRL算法，并为能够处理环境动态的IL方法开辟了道路。最后，将从集合策略框架派生的IRL算法与属于更常见的轨迹匹配家族的算法进行比较。实验表明，基于集合策略的算法优于标准的IRL和IL算法，并提供了更强大的解决方案。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on》 |2017年第8期|1814-1826|共13页
作者
Bilal Piot; Matthieu Geist; Olivier Pietquin;
展开▼
作者单位

Centrale Lille, INRIA, CNRS, UMR 9189-CRIStAL, Université Lille 1, Lille, France;

UMI 2958, Georgia Tech–CNRS, CentraleSupélec, Université Paris-Saclay, Metz, France;

Centrale Lille, INRIA, CNRS, UMR 9189 - CRIStAL, IUF, Université Lille 1, Lille, France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Heuristic algorithms; Learning (artificial intelligence); Learning systems; Markov processes; Standards; Robustness; Robots;

机译：启发式算法;学习（人工智能）;学习系统;马尔可夫过程;标准;稳健性;机器人;

相似文献

外文文献
中文文献
专利

1. Can AI predict animal movements? Filling gaps in animal trajectories using inverse reinforcement learning [J] . Tsubasa Hirakawa, Takayoshi Yamashita, Toru Tamaki, Ecosphere . 2018,第10期

机译：AI可以预测动物运动吗？使用逆向强化学习填补动物轨迹中的空白
2. Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning [J] . You Changxi, Lu Jianbo, Filev Dimitar, Robotics and Autonomous Systems . 2019,第期

机译：利用强化学习和深度逆钢筋学习的自治车辆先进规划
3. Cloud Resource Scheduling With Deep Reinforcement Learning and Imitation Learning [J] . Guo Wenxia, Tian Wenhong, Ye Yufei, Internet of Things Journal, IEEE . 2021,第5期

机译：云资源调度与深增强学习和模仿学习
4. Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning [C] . Ziming Li, Julia Kiseleva, Maarten de Rijke AAAI Conference on Artificial Intelligence . 2019

机译：对话一代：从模仿学习到逆钢筋学习
5. Min-Max Inverse Reinforcement Learning for Learning Bi-Modal Dialogue Policies [D] . Patil, Gandharv. 2020

机译：用于学习双模对话策略的最大最大逆钢筋学习
6. Learning for a Robot: Deep Reinforcement Learning Imitation Learning Transfer Learning [O] . Jiang Hua, Liangcai Zeng, Gongfa Li, 2021

机译：学习机器人：深增强学习仿制学习转移学习
7. Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning [O] . Piot, Bilal, Geist, Matthieu, Pietquin, Olivier 2017

机译：弥合模仿学习与反强化学习之间的鸿沟

Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅