首页> 外文会议>ICMLA 2012;International Conference on Machine Learning and Applications >An Inverse Reinforcement Learning Algorithm for Partially Observable Domains with Application on Healthcare Dialogue Management
【24h】

An Inverse Reinforcement Learning Algorithm for Partially Observable Domains with Application on Healthcare Dialogue Management

机译:部分可观察领域的逆强化学习算法及其在医疗对话管理中的应用

获取原文

摘要

In this paper, we propose an algorithm for learning a reward model from an expert policy in partially observable Markov decision processes (POMDPs). The problem is formulated as inverse reinforcement learning (IRL) in the POMDP framework. The proposed algorithm then uses the expert trajectories to find an unknown reward model-based on the known POMDP model components. Similar to previous IRL work in Markov Decision Processes (MDPs), our algorithm maximizes the sum of the margin between the expert policy and the intermediate candidate policies. However, in contrast to previous work, the expert and intermediate candidate policy values are approximated using the beliefs recovered from the expert trajectories, specifically by approximating expert belief transitions. We apply our IRL algorithm to a healthcare dialogue POMDP where the POMDP model components are estimated from real dialogues. Our experimental results show that the proposed algorithm is able to learn a reward model that accounts for the expert policy.
机译:在本文中,我们提出了一种在部分可观察的马尔可夫决策过程(POMDP)中从​​专家策略中学习奖励模型的算法。该问题被表述为POMDP框架中的逆强化学习(IRL)。然后,基于已知的POMDP模型组件,所提出的算法使用专家轨迹来找到未知的奖励模型。与先前的马尔可夫决策过程(MDP)中的IRL工作类似,我们的算法可最大程度地提高专家策略与中间候选策略之间的余量之和。但是,与以前的工作相反,专家和中级候选者策略值是使用从专家轨迹中恢复的信念来近似估算的,尤其是通过近似专家信念转换来估算。我们将IRL算法应用于医疗对话POMDP,其中POMDP模型组件是根据实际对话估算的。我们的实验结果表明,所提出的算法能够学习解释专家策略的奖励模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号