首页> 外文会议>Conference on empirical methods in natural language processing >Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning
【24h】

Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning

机译:有区别的Deep Dyna-Q:针对对话政策学习的稳健计划

获取原文

摘要

This paper presents a Discriminative Deep Dyna-Q (D3Q) approach to improving the effectiveness and robustness of Deep Dyna-Q (DDQ). a recently proposed framework that extends the Dyna-Q algorithm to integrate planning for task-completion dialogue policy learning. To obviate DDQ's high dependency on the quality of simulated experiences, we incorporate an RNN-based discriminator in D3Q to differentiate simulated experience from real user experience in order to control the quality of training data. Experiments show that D3Q signilicantly outperforms DDQ by controlling the quality of simulated experience used for planning. The effectiveness and robustness of D3Q is further demonstrated in a domain extension setting, where the agent's capability of adapting to a changing environment is tested.
机译:本文提出了一种辨别性深度DYNA-Q(D3Q)方法,以提高深度DYNA-Q(DDQ)的有效性和鲁棒性。最近提出的框架扩展了Dyna-Q算法,以集成任务完成对话策略学习的规划。为了避免DDQ对模拟体验质量的高依赖性,我们在D3Q中纳入了基于RNN的鉴别器,以区分从真实用户体验中的模拟体验,以控制训练数据的质量。实验表明,D3Q通过控制用于规划的模拟经验质量来表明DDQ。 D3Q的有效性和稳健性在域扩展设置中进一步展示,其中测试了代理的适应变更环境的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号