首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Sample Efficient Deep Reinforcement Learning for Dialogue Systems With Large Action Spaces
【24h】

Sample Efficient Deep Reinforcement Learning for Dialogue Systems With Large Action Spaces

机译:具有大动作空间的对话系统的示例高效深度强化学习

获取原文
获取原文并翻译 | 示例

摘要

In spoken dialogue systems, we aim to deploy artificial intelligence to build automated dialogue agents that can converse with humans. A part of this effort is the policy optimization task, which attempts to find a policy describing how to respond to humans, in the form of a function taking the current state of the dialogue and returning the response of the system. In this paper, we investigate deep reinforcement learning approaches to solve this problem. Particular attention is given to actor-critic methods, off-policy reinforcement learning with experience replay, and various methods aimed at reducing the bias and variance of estimators. When combined, these methods result in the previously proposed ACER algorithm that gave competitive results in gaming environments. These environments, however, are fully observable and have a relatively small action set so, in this paper, we examine the application of ACER to dialogue policy optimization. We show that this method beats the current state of the art in deep learning approaches for spoken dialogue systems. This not only leads to a more sample efficient algorithm that can train faster, but also allows us to apply the algorithm in more difficult environments than before. We thus experiment with learning in a very large action space, which has two orders of magnitude more actions than previously considered. We find that ACER trains significantly faster than the current state of the art.
机译:在口语对话系统中,我们旨在部署人工智能以构建可以与人类对话的自动化对话代理。这项工作的一部分是策略优化任务,该任务试图以描述对话当前状态并返回系统响应的函数形式找到描述如何对人类做出反应的策略。在本文中,我们研究了深度强化学习方法来解决此问题。特别关注行为者批判方法,具有经验重播的非政策强化学习以及旨在减少估计量偏差和方差的各种方法。结合使用这些方法,可以得出先前提出的ACER算法,该算法在游戏环境中具有竞争优势。但是,这些环境是完全可观察的,并且动作集相对较小,因此,在本文中,我们研究了ACER在对话策略优化中的应用。我们表明,这种方法在口语对话系统的深度学习方法中超越了当前的最新水平。这不仅可以提高采样效率,可以更快地训练算法,还可以使我们在比以前更困难的环境中应用该算法。因此,我们在非常大的动作空间中进行学习实验,该动作空间比以前考虑的动作多两个数量级。我们发现,ACER的训练比当前的最新技术快得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号