Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement

Ali Ezzeddine; Nafee Mourad; Babak Nadjar Araabi; Majid Nili Ahmadabadi

首页> 外文期刊>Expert Systems with Application >Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement

【24h】

Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement

机译：通过逆向强化学习和贝叶斯政策改进，结合非最佳演示和反馈中的学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Inverse reinforcement learning (IRL) is a powerful tool for teaching by demonstrations, provided that sufficiently diverse and optimal demonstrations are given, and learner agent correctly perceives those demonstrations. These conditions are hard to meet in practice; as a trainer cannot cover all possibilities by demonstrations, he may partially fail to follow the optimal behavior. Also, trainer and learner have different perceptions of the environment including trainer's actions. A practical way to overcome these problems is using a combination of trainer's demonstrations and feedbacks.We propose an interactive learning approach to overcome the challenge of non-optimal demonstrations by integrating human evaluative feedbacks with theIRLprocess, given sufficiently diverse demonstrations and the domain transition model. To this end, we develop a probabilistic model of human feedbacks and iteratively improve the agent policy using Bayes rule. We then integrate this information in an extendedIRLalgorithm to enhance the learned reward function.We examine the developed approach in one experimental and two simulated tasks; i.e., a grid world navigation, a highway car driving system and a navigation task by the e-puck robot. Obtained results show significant improved efficiency of the proposed approach in face of having different levels of non-optimality in demonstrations and the number of evaluative feedbacks.

机译：逆向强化学习（IRL）是进行演示教学的强大工具，前提是要给出足够多样化和最佳的演示，并且学习者正确理解了这些演示。在实践中很难满足这些条件。由于培训师无法通过演示涵盖所有可能性，因此他可能会部分地无法遵循最佳行为。同样，培训者和学习者对环境的理解也不同，包括培训者的行为。解决这些问题的一种切实可行的方法是结合培训师的演示和反馈。我们提出了一种交互式学习方法，通过将人类评估反馈与IRL流程集成在一起，从而克服了非最优演示的挑战，并提供了足够多样化的演示和领域转换模型。为此，我们开发了一种人类反馈的概率模型，并使用贝叶斯规则迭代地改进了代理策略。然后，我们将此信息集成到扩展的IR算法中，以增强学习的奖励功能。我们在一个实验任务和两个模拟任务中研究了开发的方法；即网格世界导航，公路汽车驾驶系统和e-puck机器人的导航任务。获得的结果表明，面对演示中不同级别的非最优性以及评估反馈的数量，所提出方法的效率有了显着提高。

著录项

来源
《Expert Systems with Application》 |2018年第12期|331-341|共11页
作者
Ali Ezzeddine; Nafee Mourad; Babak Nadjar Araabi; Majid Nili Ahmadabadi;
展开▼
作者单位

Machine Learning and Computational Modeling Laboratory, School of ECE, College of Engineering, University of Tehran;

Cognitive Systems Laboratory, School of ECE, College of Engineering, University of Tehran;

Machine Learning and Computational Modeling Laboratory, School of ECE, College of Engineering, University of Tehran;

Cognitive Systems Laboratory, School of ECE, College of Engineering, University of Tehran;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Teaching by demonstrations; Inverse reinforcement learning; Interactive learning; Human evaluative feedbacks;

机译：示范教学;逆向强化学习;互动学习;人类评估反馈;

相似文献

外文文献
中文文献
专利

1. Improved Bayesian inverse reinforcement learning based on demonstration and feedback [J] . Hengliang Tang, Anqi Wang, Xi Yang International journal of wireless and mobile computing . 2019,第4期

机译：基于示范和反馈的贝叶斯逆钢筋改进
2. Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach [J] . Nafee Mourad, Ali Ezzeddine, Babak Nadjar Araabi, Journal of robotics . 2020,第Pta1期

机译：从演示和人类评估反馈中学习：使用反增强学习方法处理稀疏性和缺陷
3. Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach [J] . Nafee Mourad, Ali Ezzeddine, Babak Nadjar Araabi, Journal of robotics . 2020,第2期

机译：从演示和人类评估反馈中学习：使用逆强化学习方法处理稀疏性和缺陷
4. Learning Virtual Grasp with Failed Demonstrations via Bayesian Inverse Reinforcement Learning [C] . Xu Xie, Changyang Li, Chi Zhang, IEEE/RSJ International Conference on Intelligent Robots and Systems . 2019

机译：通过贝叶斯逆钢筋学习，学习虚拟掌握失败的示威活动
5. Min-Max Inverse Reinforcement Learning for Learning Bi-Modal Dialogue Policies [D] . Patil, Gandharv. 2020

机译：用于学习双模对话策略的最大最大逆钢筋学习
6. Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning [O] . Tulika Saha, Sriparna Saha, Pushpak Bhattacharyya 2020

机译：利用等级强化学习的多意图对话的情感对话策略学习
7. Learning from Longitudinal Face Demonstration—Where Tractable Deep Modeling Meets Inverse Reinforcement Learning [O] . Chi Nhan Duong, Kha Gia Quach, Khoa Luu, 2019

机译：从纵向表演中学习 - 贸易的深层建模符合逆钢筋学习

Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement

摘要

著录项

相似文献

相关主题

期刊订阅