Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning

机译：有区别的Deep Dyna-Q：针对对话政策学习的稳健计划

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a Discriminative Deep Dyna-Q (D3Q) approach to improving the effectiveness and robustness of Deep Dyna-Q (DDQ). a recently proposed framework that extends the Dyna-Q algorithm to integrate planning for task-completion dialogue policy learning. To obviate DDQ's high dependency on the quality of simulated experiences, we incorporate an RNN-based discriminator in D3Q to differentiate simulated experience from real user experience in order to control the quality of training data. Experiments show that D3Q signilicantly outperforms DDQ by controlling the quality of simulated experience used for planning. The effectiveness and robustness of D3Q is further demonstrated in a domain extension setting, where the agent's capability of adapting to a changing environment is tested.

机译：本文提出了一种辨别性深度DYNA-Q（D3Q）方法，以提高深度DYNA-Q（DDQ）的有效性和鲁棒性。最近提出的框架扩展了Dyna-Q算法，以集成任务完成对话策略学习的规划。为了避免DDQ对模拟体验质量的高依赖性，我们在D3Q中纳入了基于RNN的鉴别器，以区分从真实用户体验中的模拟体验，以控制训练数据的质量。实验表明，D3Q通过控制用于规划的模拟经验质量来表明DDQ。 D3Q的有效性和稳健性在域扩展设置中进一步展示，其中测试了代理的适应变更环境的能力。

著录项

来源
《Conference on empirical methods in natural language processing》|2018年|3813-3823|共11页
会议地点
作者
Shang-Yu Su; Xiujun Li; Jianfeng Gao; Jingjing Liu; Yun-Nung Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Towards integrated dialogue policy learning for multiple domains and intents using Hierarchical Deep Reinforcement Learning [J] . Saha Tulika, Gupta Dhawal, Saha Sriparna, Expert Systems with Application . 2020,第Deca期

机译：利用分层深度加强学习对多个域和意图的综合对话政策学习
2. Discriminative Robust Deep Dictionary Learning for Hyperspectral Image Classification [J] . Vanika Singhal, Hemant K. Aggarwal, Snigdha Tariyal, IEEE Transactions on Geoscience and Remote Sensing . 2017,第9期

机译：判别鲁棒深度字典学习用于高光谱图像分类
3. Learning Deep Policies for Robot Bin Picking by Simulating Robust Grasping Sequences [J] . Jeffrey Mahler, Ken Goldberg JMLR: Workshop and Conference Proceedings . 2017,第2009期

机译：通过模拟稳健的抓取序列来学习机器人箱拣选的深层策略
4. Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning [C] . Shang-Yu Su, Xiujun Li, Jianfeng Gao, Conference on empirical methods in natural language processing . 2018

机译：歧视性深度DYNA-Q：对话政策学习的强大规划
5. On Deep Reinforcement Learning for Games: Generalization of Deep Q-Learning with Multiple Policy Heads [D] . Boucher, Mathieu. 2020

机译：关于游戏的深度加固学习：多重政策头部深度Q学的泛化
6. Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning [O] . Tulika Saha, Sriparna Saha, Pushpak Bhattacharyya 2020

机译：利用等级强化学习的多意图对话的情感对话策略学习
7. Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning [O] . Shang-Yu Su, Xiujun Li, Jianfeng Gao, 2018

机译：歧视性深度DYNA-Q：对话政策学习的强大规划

Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning

摘要

著录项

相似文献

相关主题

期刊订阅