Analysis of Inverse Reinforcement Learning with Perturbed Demonstrations

机译：扰动演示分析逆钢筋学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Inverse reinforcement learning (IRL) addresses the problem of recovering the unknown reward function for a given Markov decision problem (MDP) given the corresponding optimal policy or a perturbed version thereof. This paper studies the space of possible solutions to the general IRL problem, when the agent is provided with incomplete/imperfect information regarding the optimal policy for the MDP whose reward must be estimated. We focus on scenarios with finite state-action spaces and discuss the constraints imposed on the set of possible solutions when the agent is provided with (i) perturbed policies; (ii) optimal policies; and (iii) incomplete policies. We discuss previous works on IRL in light of our analysis and show that, with our characterization of the solution space, it is possible to determine non-trivial closed-form solutions for the IRL problem. We also discuss several other interesting aspects of the IRL problem that stem from our analysis.

机译：逆加强学习（IRL）解决了给定马尔可夫决策问题（MDP）恢复未知奖励函数的问题给定相应的最佳策略或其扰动版本。本文研究了一般IRL问题的可能解决方案的空间，当时代理商提供了关于必须估算奖励的MDP的最佳政策的不完整信息。我们专注于具有有限状态空间的场景，并在代理提供（i）扰动的政策时讨论在可能的解决方案集中强加的约束; （ii）最佳政策; （iii）不完整的政策。我们根据我们的分析讨论了IRL上的先前作品，并表明，随着我们对解决方案的表征，可以确定IRL问题的非普通闭合液解决方案。我们还讨论了来自我们分析的IRL问题的几个其他有趣方面。

著录项

来源
《European Conference on Artificial Intelligence》|2010年||共6页
会议地点
作者
Francisco S. Melo; Manuel Lopes; Ricardo Ferreira;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach [J] . Nafee Mourad, Ali Ezzeddine, Babak Nadjar Araabi, Journal of robotics . 2020,第Pta1期

机译：从演示和人类评估反馈中学习：使用反增强学习方法处理稀疏性和缺陷
2. Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach [J] . Nafee Mourad, Ali Ezzeddine, Babak Nadjar Araabi, Journal of robotics . 2020,第2期

机译：从演示和人类评估反馈中学习：使用逆强化学习方法处理稀疏性和缺陷
3. Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement [J] . Ali Ezzeddine, Nafee Mourad, Babak Nadjar Araabi, Expert Systems with Application . 2018,第DECa期

机译：通过逆向强化学习和贝叶斯政策改进，结合非最佳演示和反馈中的学习
4. Analysis of Inverse Reinforcement Learning with Perturbed Demonstrations [C] . Francisco S. Melo, Manuel Lopes, Ricardo Ferreira European Conference on Artificial Intelligence . 2010

机译：扰动演示分析逆钢筋学习
5. Min-Max Inverse Reinforcement Learning for Learning Bi-Modal Dialogue Policies [D] . Patil, Gandharv. 2020

机译：用于学习双模对话策略的最大最大逆钢筋学习
6. How much of reinforcement learning is working memory not reinforcement learning? A behavioral computational and neurogenetic analysis [O] . Anne G. E. Collins, Michael J. Frank -1

机译：钢筋学习多少是工作记忆而不是加强学习？行为计算和神经肝分析
7. Learning from Longitudinal Face Demonstration—Where Tractable Deep Modeling Meets Inverse Reinforcement Learning [O] . Chi Nhan Duong, Kha Gia Quach, Khoa Luu, 2019

机译：从纵向表演中学习 - 贸易的深层建模符合逆钢筋学习
8. Nonconvergence to Saddle Boundary Points under Perturbed Reinforcement Learning. [R] . Chasparis, G. C., Shamma, J. S., Rantzer, A. 2012

机译：扰动强化学习下鞍边界点的非收敛性。

Analysis of Inverse Reinforcement Learning with Perturbed Demonstrations

摘要

著录项

相似文献

相关主题

期刊订阅