首页> 外文会议>AAAI Conference on Artificial Intelligence >Goal-Oriented Dialogue Policy Learning from Failures
【24h】

Goal-Oriented Dialogue Policy Learning from Failures

机译:面向目标的对话政策从失败中学习

获取原文
获取外文期刊封面目录资料

摘要

Reinforcement learning methods have been used for learning dialogue policies. However, learning an effective dialogue policy frequently requires prohibitively many conversations. This is partly because of the sparse rewards in dialogues, and the very few successful dialogues in early learning phase. Hindsight experience replay (HER) enables learning from failures, but the vanilla HER is inapplicable to dialogue learning due to the implicit goals. In this work, we develop two complex HER methods providing different tradeoffs between complexity and performance, and, for the first time, enabled HER-based dialogue policy learning. Experiments using a realistic user simulator show that our HER methods perform better than existing experience replay methods (as applied to deep Q-networks) in learning rate.
机译:强化学习方法已被用于学习对话政策。 然而,学习有效的对话政策经常需要许多对话。 这部分是因为对话中的稀疏奖励,以及早期学习阶段的几乎没有成功的对话。 Hindsight体验重放(她)可以从失败中学习,但由于隐含的目标,她们不适用于对话学习。 在这项工作中,我们开发了两种复杂的她的方法,在复杂性和性能之间提供不同的权衡,并且首次支持她的对话政策学习。 使用现实用户模拟器的实验表明,我们的方法比现有体验重放方法(如应用于深度Q网络)的学习率更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号