Reinforcement learning for parameter estimation in statistical spoken dialogue systems

Filip Jurcicek; Blaise Thomson; Steve Young

首页> 外文期刊>Computer speech and language >Reinforcement learning for parameter estimation in statistical spoken dialogue systems

【24h】

Reinforcement learning for parameter estimation in statistical spoken dialogue systems

机译：统计语音对话系统中用于参数估计的强化学习

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system's responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the expected behaviour of a user when interacting with the system. Ideally the parameters of this dialogue model should be also optimised to maximise the expected cumulative reward. This article presents two novel reinforcement algorithms for learning the parameters of a dialogue model. First, the Natural Belief Critic algorithm is designed to optimise the model parameters while the policy is kept fixed. This algorithm is suitable, for example, in systems using a handcrafted policy, perhaps prescribed by other design considerations. Second, the Natural Actor and Belief Critic algorithm jointly optimises both the model and the policy parameters. The algorithms are evaluated on a statistical dialogue system modelled as a Partially Observable Markov Decision Process in a tourist information domain. The evaluation is performed with a user simulator and with real users. The experiments indicate that model parameters estimated to maximise the expected reward function provide improved performance compared to the baseline handcrafted parameters.

机译：强化技术已成功用于最大化统计对话系统的预期累积奖励。通常，强化学习用于估计对话策略的参数，该策略基于推断的对话状态选择系统的响应。但是，对话状态本身的推断取决于对话模型，该模型描述了用户与系统交互时的预期行为。理想情况下，还应该优化此对话模型的参数以最大化预期的累积奖励。本文介绍了两种新颖的用于学习对话模型参数的增强算法。首先，自然信念批评算法旨在优化模型参数，同时保持策略不变。例如，此算法适用于使用手工策略（可能是由其他设计考虑因素规定）的系统。其次，自然演员和信念批评算法共同优化模型和策略参数。该算法在一个统计对话系统上进行评估，该系统被建模为旅游信息领域的部分可观察的马尔可夫决策过程。评估是通过用户模拟器和真实用户执行的。实验表明，与基准手工参数相比，估计可最大化预期奖励功能的模型参数可提供更高的性能。

著录项

来源
《Computer speech and language》 |2012年第3期|p.168-192|共25页
作者
Filip Jurcicek; Blaise Thomson; Steve Young;
展开▼
作者单位

Cambridge University, Engineering Department, Trumnington Street, Cambridge CB2 IPX, UK;

Cambridge University, Engineering Department, Trumnington Street, Cambridge CB2 IPX, UK;

Cambridge University, Engineering Department, Trumnington Street, Cambridge CB2 IPX, UK;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
spoken dialogue systems; reinforcement learning; POMDP; dialogue management;

机译：口语对话系统;强化学习;POMDP;对话管理;

相似文献

外文文献
中文文献
专利

1. An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email [J] . Walker M. A. The Journal of Artificial Intelligence Research . 2000,第7期

机译：强化学习在电子邮件口语对话系统中对话策略选择中的应用
2. An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email [J] . Marilyn A. Walker The Journal of Artificial Intelligence Research . 2000,第0期

机译：强化学习在电子邮件口语对话系统中对话策略选择中的应用
3. Spoken Dialogue System for Information Navigation based on Statistical Learning of Semantic and Dialogue Structure [J] . 吉野幸一郎人工知能: 人工知能学会誌 . 2015,第1期

机译：基于语义和对话结构统计学习的信息导航口语对话系统
4. Natural Belief-Critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems [C] . F Jurcicek, B. Thomson, S. Keizer, Annual conference of the International Speech Communication Association;INTERSPEECH 2010 . 2011

机译：自然信念批评：统计语音对话系统中参数估计的增强算法
5. Transfer Reinforcement Learning for Task-Oriented Dialogue Systems [D] . Mo, Kaixiang. 2018

机译：面向任务的对话系统的转移强化学习
6. Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning [O] . Tulika Saha, Sriparna Saha, Pushpak Bhattacharyya 2020

机译：利用等级强化学习的多意图对话的情感对话策略学习
7. Reward-Balancing for Statistical Spoken Dialogue Systems using Multi-objective Reinforcement Learning [O] . Ultes, Stefan, Budzianowski, Paweł, Casanueva, Iñigo, 2017

机译：使用统一口语对话系统的奖励平衡多目标强化学习

Reinforcement learning for parameter estimation in statistical spoken dialogue systems

摘要

著录项

相似文献

相关主题

期刊订阅