Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance

机译：软化专家指导下的不完美演示中的加强

获取原文

页面导航

摘要
著录项
相关主题

摘要

In this paper, we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations. Most of existing RLfD methods require demonstrations to be perfect and sufficient, which yet is unrealistic to meet in practice. To work on imperfect demonstrations, we first define an imperfect expert setting for RLfD in a formal way, and then point out that previous methods suffer from two issues in terms of optimality and convergence, respectively. Upon the theoretical findings we have derived, we tackle these two issues by regarding the expert guidance as a soft constraint on regulating the policy exploration of the agent, which eventually leads to a constrained optimization problem. We further demonstrate that such problem is able to be addressed efficiently by performing a local linear search on its dual form. Considerable empirical evaluations on a comprehensive collection of benchmarks indicate our method attains consistent improvement over other RLfD counterparts.

机译：在本文中，我们通过提供专家示范来研究从演示（RLFD）的强化学习，从而提高了强化学习（RL）的勘探效率。现有的大多数现有的RLFD方法都需要演示是完美的，并且在实践中相遇的不现实程度是不现实的。为了在不完美的演示上工作，我们首先以正式的方式定义RLFD的不完美专家设置，然后指出以前的方法分别在最优性和收敛方面遭受两个问题。在理论发现，我们获得的理论发现，我们通过将这两个问题视为关于规范代理政策探索的软限制，最终导致受约束的优化问题。我们还证明，通过在其双重形式上执行本地线性搜索，能够有效地解决这些问题。关于全面收集基准测试的明度实证评估表明我们的方法达到了其他RLFD对应物的一致改进。

著录项

来源
《AAAI Conference on Artificial Intelligence》|2020年|4667-5453p|共8页
会议地点
作者
Mingxuan Jing; Xiaojian Ma; Wenbing Huang; Fuchun Sun; Chao Yang; Bin Fang; Huaping Liu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance

摘要

著录项

相关主题

期刊订阅