首页> 外文会议>European conference on machine learning and principles and practice of knowledge discovery in databases >Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees
【24h】

Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees

机译:借助线性模型U树实现可解释的深度强化学习

获取原文

摘要

Deep Reinforcement Learning (DRL) has achieved impressive success in many applications. A key component of many DRL models is a neural network representing a Q function, to estimate the expected cumulative reward following a state-action pair. The Q function neural network contains a lot of implicit knowledge about the RL problems, but often remains unexamined and uninterpreted. To our knowledge, this work develops the first mimic learning framework for Q functions in DRL. We introduce Linear Model U-trees (LMUTs) to approximate neural network predictions. An LMUT is learned using a novel on-line algorithm that is well-suited for an active play setting, where the mimic learner observes an ongoing interaction between the neural net and the environment. Empirical evaluation shows that an LMUT mimics a Q function substantially better than five baseline methods. The transparent tree structure of an LMUT facilitates understanding the network's learned strategic knowledge by analyzing feature influence, extracting rules, and highlighting the super-pixels in image inputs. Code related to this paper is available at: https://github.com/Guiliang/ uTree-mimic_mountain_car.
机译:深度强化学习(DRL)在许多应用中都取得了令人瞩目的成功。许多DRL模型的关键组成部分是代表Q函数的神经网络,用于估计状态-动作对后的预期累积奖励。 Q函数神经网络包含许多有关RL问题的隐性知识,但通常仍未经检查和解释。据我们所知,这项工作为DRL中的Q函数开发了第一个模拟学习框架。我们引入线性模型U树(LMUT)来近似神经网络预测。 LMUT是使用一种新颖的在线算法来学习的,该算法非常适合于主动游戏的设置,其中模仿学习者可以观察到神经网络与环境之间正在进行的交互作用。实证评估表明,LMUT模仿Q函数的效果明显好于五种基线方法。 LMUT的透明树结构通过分析功能影响,提取规则并突出显示图像输入中的超像素,有助于理解网络学到的战略知识。与本文相关的代码可在以下网址获得:https://github.com/Guiliang/ uTree-mimic_mountain_car。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号