Off-policy learning in large-scale POMDP-based dialogue systems

机译：基于POMDP的大型对话系统中的非政策学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement learning (RL) is now part of the state of the art in the domain of spoken dialogue systems (SDS) optimisation. Most performant RL methods, such as those based on Gaussian Processes, require to test small changes in the policy to assess them as improvements or degradations. This process is called on policy learning. Nevertheless, it can result in system behaviours that are not acceptable by users. Learning algorithms should ideally infer an optimal strategy by observing interactions generated by a non-optimal but acceptable strategy, that is learning off-policy. Such methods usually fail to scale up and are thus not suited for real-world systems. In this contribution, a sample-efficient, online and off-policy RL algorithm is proposed to learn an optimal policy. This algorithm is combined to a compact non-linear value function representation (namely a multi-layers perceptron) enabling to handle large scale systems.

机译：强化学习（RL）现在是口语对话系统（SDS）优化领域中最新技术的一部分。大多数高性能的RL方法（例如基于高斯过程的方法）都需要测试策略中的细微变化，以将其评估为改进或降级。此过程称为策略学习。但是，它可能导致用户无法接受的系统行为。理想情况下，学习算法应该通过观察非最佳但可接受的策略（即学习非策略）所产生的交互作用来推断最佳策略。这种方法通常无法按比例放大，因此不适用于实际系统。在此贡献中，提出了一种样本有效的在线和非策略RL算法，以学习最佳策略。该算法被组合为紧凑的非线性值函数表示形式（即多层感知器），能够处理大规模系统。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing;ICASSP》|2012年|p.4989- 4992|共4页
会议地点 Kyoto(JP)
作者
Daubigney, Lucie;
展开▼
作者单位

IMS Supélec (Metz France);

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. The Day a System Becomes a Conversation Partner—Exploring New Horizons in Social Dialogue Systems with Large-scale Deep Learning [J] . Hiroaki Sugiyama, Masahiro Mizukami, Tsunehiro Arimoto, NTT Technical Review . 2021,第9期

机译：系统成为谈话伙伴的谈话伙伴 - 探索具有大规模深度学习的社会对话系统中的新视野
2. H∞ Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning [J] . Li Jinna, Xiao Zhenfei Quality Control, Transactions . 2020,第期

机译：H∞通过违规Q-Learning对离散时间多人体系进行控制
3. Data-Driven Nonzero-Sum Game for Discrete-Time Systems Using Off-Policy Reinforcement Learning [J] . Yang Yongliang, Zhang Sen, Dong Jie, Quality Control, Transactions . 2020,第期

机译：利用禁止策略强化学习的离散时间系统的数据驱动非零游戏
4. Off-policy learning in large-scale POMDP-based dialogue systems [C] . Daubigney Lucie IEEE International Conference on Acoustics, Speech and Signal Processing . 2011

机译：基于大型POMDP的对话系统中的禁止策略学习
5. An Improved Approach of Intention Discovery with Machine Learning for POMDP-Based Dialogue Management [D] . Raval, Ruturaj Rajendrakumar 2019

机译：基于POMDP的对话管理的机器学习意图发现的改进方法
6. A Rasch Model and Rating System for Continuous Responses Collected in Large-Scale Learning Systems [O] . Benjamin Deonovic, Maria Bolsinova, Timo Bechger, 2020

机译：大型学习系统中收集的连续响应的RASCH模型和评级系统
7. Off-policy Learning in Large-scale POMDP-based Dialogue Systems [O] . Daubigney Lucie, Geist Matthieu, Pietquin Olivier 2012

机译：基于大型POMDP的对话系统中的非政策学习

Off-policy learning in large-scale POMDP-based dialogue systems

摘要

著录项

相似文献

相关主题

期刊订阅