Goal-Oriented Dialogue Policy Learning from Failures

机译：面向目标的对话政策从失败中学习

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Reinforcement learning methods have been used for learning dialogue policies. However, learning an effective dialogue policy frequently requires prohibitively many conversations. This is partly because of the sparse rewards in dialogues, and the very few successful dialogues in early learning phase. Hindsight experience replay (HER) enables learning from failures, but the vanilla HER is inapplicable to dialogue learning due to the implicit goals. In this work, we develop two complex HER methods providing different tradeoffs between complexity and performance, and, for the first time, enabled HER-based dialogue policy learning. Experiments using a realistic user simulator show that our HER methods perform better than existing experience replay methods (as applied to deep Q-networks) in learning rate.

机译：强化学习方法已被用于学习对话政策。然而，学习有效的对话政策经常需要许多对话。这部分是因为对话中的稀疏奖励，以及早期学习阶段的几乎没有成功的对话。 Hindsight体验重放（她）可以从失败中学习，但由于隐含的目标，她们不适用于对话学习。在这项工作中，我们开发了两种复杂的她的方法，在复杂性和性能之间提供不同的权衡，并且首次支持她的对话政策学习。使用现实用户模拟器的实验表明，我们的方法比现有体验重放方法（如应用于深度Q网络）的学习率更好。

著录项

来源
《AAAI Conference on Artificial Intelligence》|2019年|1723-2636p|共8页
会议地点
作者
Keting Lu; Shiqi Zhang; Xiaoping Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Towards integrated dialogue policy learning for multiple domains and intents using Hierarchical Deep Reinforcement Learning [J] . Saha Tulika, Gupta Dhawal, Saha Sriparna, Expert Systems with Application . 2020,第Deca期

机译：利用分层深度加强学习对多个域和意图的综合对话政策学习
2. "Events and failures are our only means for making policy changes": learning in disaster and emergency management policies in Manitoba, Canada [J] . Haque C. Emdad, Choudhury Mahed-Ul-Islam, Sikder Md Sowayib Natural Hazards . 2019,第1期

机译：“事件和失败是我们制定政策变革的唯一手段”：加拿大曼尼托巴的灾害和紧急管理政策学习
3. Improving domain action classification in goal-oriented dialogues using a mutual retraining method [J] . Choong-Nyoung Seon, Hyunjung Lee, Harksoo Kim, Pattern recognition letters . 2014,第auga1期

机译：使用双向重训练方法改善面向目标的对话中的领域动作分类
4. Goal-Oriented Dialogue Policy Learning from Failures [C] . Keting Lu, Shiqi Zhang, Xiaoping Chen AAAI Conference on Artificial Intelligence . 2019

机译：面向目标的对话政策从失败中学习
5. Min-Max Inverse Reinforcement Learning for Learning Bi-Modal Dialogue Policies [D] . Patil, Gandharv. 2020

机译：用于学习双模对话策略的最大最大逆钢筋学习
6. Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning [O] . Tulika Saha, Sriparna Saha, Pushpak Bhattacharyya 2020

机译：利用等级强化学习的多意图对话的情感对话策略学习
7. Goal-Oriented Dialogue Policy Learning from Failures [O] . Keting Lu, Shiqi Zhang, Xiaoping Chen 2019

机译：面向目标的对话政策从失败中学习

Goal-Oriented Dialogue Policy Learning from Failures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅