首页> 外文会议>IEEE Data Driven Control and Learning Systems Conference >Maximum Entropy Inverse Reinforcement Learning Based on Behavior Cloning of Expert Examples
【24h】

Maximum Entropy Inverse Reinforcement Learning Based on Behavior Cloning of Expert Examples

机译:基于行为克隆的专家例子的最大熵逆钢筋学习

获取原文

摘要

This study proposes a preprocessing framework for expert examples based on behavior cloning (BC) to solve the problem that inverse reinforcement learning (IRL) is inaccurate due to the noises of expert examples. In order to remove the noises in the expert examples, we first use supervised learning to learn the approximate expert policy, and then use this approximate expert policy to clone new expert examples from the old expert examples, the idea of this preprocessing framework is BC, IRL can obtain higher quality expert examples after preprocessing. The IRL framework adopts the form of maximum entropy, and specific experiments demonstrate the effectiveness of the proposed approach, in the case of expert examples with noises, the reward functions that after BC preprocessing is better than that without preprocessing, especially with the increase of noise level, the effect is particularly obvious.
机译:本研究提出了一种基于行为克隆(BC)来解决基于行为克隆(BC)的专家示例的预处理框架,以解决由于专家示例的噪音而不准确的逆增强学习(IRL)。 为了在专家的例子中删除噪声,我们首先使用监督学习来学习近似的专家策略,然后使用这一近似专家策略来克隆新的专家案例从旧的专家示例中,这个预处理框架的想法是BC, IRL可以在预处理后获得更高质量的专家示例。 IRL框架采用最大熵的形式,具体实验证明了所提出的方法的有效性,在具有噪声的专家示例的情况下,在BC预处理之后的奖励功能比没有预处理的情况更好,特别是随着噪音的增加 水平,效果尤为明显。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号