An Inverse Reinforcement Learning Algorithm for Partially Observable Domains with Application on Healthcare Dialogue Management

机译：部分可观察领域的逆强化学习算法及其在医疗对话管理中的应用

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose an algorithm for learning a reward model from an expert policy in partially observable Markov decision processes (POMDPs). The problem is formulated as inverse reinforcement learning (IRL) in the POMDP framework. The proposed algorithm then uses the expert trajectories to find an unknown reward model-based on the known POMDP model components. Similar to previous IRL work in Markov Decision Processes (MDPs), our algorithm maximizes the sum of the margin between the expert policy and the intermediate candidate policies. However, in contrast to previous work, the expert and intermediate candidate policy values are approximated using the beliefs recovered from the expert trajectories, specifically by approximating expert belief transitions. We apply our IRL algorithm to a healthcare dialogue POMDP where the POMDP model components are estimated from real dialogues. Our experimental results show that the proposed algorithm is able to learn a reward model that accounts for the expert policy.

机译：在本文中，我们提出了一种在部分可观察的马尔可夫决策过程（POMDP）中从专家策略中学习奖励模型的算法。该问题被表述为POMDP框架中的逆强化学习（IRL）。然后，基于已知的POMDP模型组件，所提出的算法使用专家轨迹来找到未知的奖励模型。与先前的马尔可夫决策过程（MDP）中的IRL工作类似，我们的算法可最大程度地提高专家策略与中间候选策略之间的余量之和。但是，与以前的工作相反，专家和中级候选者策略值是使用从专家轨迹中恢复的信念来近似估算的，尤其是通过近似专家信念转换来估算。我们将IRL算法应用于医疗对话POMDP，其中POMDP模型组件是根据实际对话估算的。我们的实验结果表明，所提出的算法能够学习解释专家策略的奖励模型。

著录项

来源
《ICMLA 2012;International Conference on Machine Learning and Applications》|2012年|p.144-149|共6页
会议地点
作者
Chinaei Hamid R.; Chaib-Draa Brahim;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动推理、机器学习;自动推理、机器学习;
关键词
Inverse reinforcement learning; dialogue management; partially observable Markov decision processes;

机译：逆强化学习;对话管理;部分可观察的马尔可夫决策过程;

相似文献

外文文献
中文文献
专利

1. Inverse Reinforcement Learning in Partially Observable Environments [J] . Choi Jaedeug, Kim Kee-Eung Journal of machine learning research . 2011,第Mar期

机译：部分可观察环境中的反强化学习
2. A Pulse Neural Network Reinforcement Learning Algorithm for Partially Observable Markov Decision Processes [J] . Koichiro Takita, Masafumi Hagiwara Systems and Computers in Japan . 2005,第3期

机译：部分可观察的马尔可夫决策过程的脉冲神经网络强化学习算法
3. A pulse neural network reinforcement learning algorithm for partially observable Markov decision process [J] . Koichiro Takita, Masafumi Hagiwara 電子情報通信学会技術研究報告. ニュ-ロコンピュ-ティング. Neurocomputing . 2001,第735期

机译：局部可观察马尔可夫决策过程的脉冲神经网络强化学习算法
4. An Inverse Reinforcement Learning Algorithm for Partially Observable Domains with Application on Healthcare Dialogue Management [C] . Chinaei Hamid R., Chaib-Draa Brahim International Conference on Machine Learning and Applications . 2012

机译：应用于医疗对话管理的部分可观察域的逆钢筋学习算法
5. Min-Max Inverse Reinforcement Learning for Learning Bi-Modal Dialogue Policies [D] . Patil, Gandharv. 2020

机译：用于学习双模对话策略的最大最大逆钢筋学习
6. Is Deep Reinforcement Learning Ready for Practical Applications in Healthcare? A Sensitivity Analysis of Duel-DDQN for Hemodynamic Management in Sepsis Patients [O] . MingYu Lu, Zachary Shahn, Daby Sow, 2020

机译：深增强学习是否准备用于医疗保健的实际应用？脓毒症患者血流动力学管理的DUEL-DDQN敏感性分析
7. An Inverse Reinforcement Learning Algorithm for Partially Observable Domains with Application on Healthcare Dialogue Management [O] . Hamid R. Chinaei, Brahim Chaib-draa 2012

机译：部分可观察域的逆强化学习算法及其在医疗对话管理中的应用

An Inverse Reinforcement Learning Algorithm for Partially Observable Domains with Application on Healthcare Dialogue Management

摘要

著录项

相似文献

相关主题

期刊订阅