Back-off action selection in summary space-based POMDP dialogue systems

机译：基于摘要的基于空间的POMDP对话系统中的退避动作选择

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper deals with the issue of invalid state-action pairs in the Partially Observable Markov Decision Process (POMDP) framework, with a focus on real-world tasks where the need for approximate solutions exacerbates this problem. In particular, when modelling dialogue as a POMDP, both the state and the action space must be reduced to smaller scale summary spaces in order to make learning tractable. However, since not all actions are valid in all states, the action proposed by the policy in summary space sometimes leads to an invalid action when mapped back to master space. Some form of back-off scheme must then be used to generate an alternative action. This paper demonstrates how the value function derived during reinforcement learning can be used to order back-off actions in an N-best list. Compared to a simple baseline back-off strategy and to a strategy that extends the summary space to minimise the occurrence of invalid actions, the proposed N-best action selection scheme is shown to be significantly more robust.

机译：本文在部分可观察的马尔可夫决策过程（POMDP）框架中处理无效的状态-动作对问题，重点关注现实世界中的任务，在这些任务中，近似解决方案的需求加剧了这一问题。特别是，在将对话建模为POMDP时，必须将状态和动作空间都缩小为较小的摘要空间，以使学习变得容易。但是，由于并非所有动作在所有状态下都有效，因此策略在摘要空间中建议的动作在映射回主空间时有时会导致无效的动作。然后必须使用某种形式的退避方案来生成替代动作。本文演示了如何将在强化学习过程中得出的价值函数用于对N个最佳列表中的退避动作进行排序。与简单的基准退避策略以及扩展摘要空间以最大程度地减少无效操作的策略相比，建议的N最佳操作选择方案显示出明显更强的鲁棒性。

著录项

来源
《Automatic Speech Recognition amp; Understanding, 2009. ASRU 2009》|2009年|456-461|共6页
会议地点 Merano(IT);Merano(IT)
作者
Gasic M.; Lefevre F.; Jurcicek F.; Keizer S.; Mairesse F.; Thomson B.; Yu K.; Young S.;
展开▼
作者单位

Spoken Dialogue Systems Group, Cambridge University Engineering Department, Trumpington Street, CB2 1PZ, UK;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems [J] . Blaise Thomson, Steve Young Computer speech and language . 2010,第4期

机译：贝叶斯对话状态更新：用于语音对话系统的POMDP框架
2. A Tractable Hybrid Ddn-pomdp Approach To Affective Dialogue Modeling For Probabilisticframe-based Dialogue Systems [J] . TRUNG H. BUI, MANNES POEL, ANTON NIJHOLT, Natural language engineering . 2009,第pta2期

机译：基于概率框架的对话系统的情感对话建模的可操作混合Ddn-pomdp方法
3. Building Adaptive Dialogue Systems Via Bayes-Adaptive POMDPs [J] . Png S., Pineau J., Chaib-Draa B. Selected Topics in Signal Processing, IEEE Journal of . 2012,第8期

机译：通过贝叶斯自适应POMDP构建自适应对话系统
4. Back-off Action Selection in Summary Space-Based POMDP Dialogue Systems [C] . M. Gasic, F. Lefevre, F. Jurcicek, IEEE Workshop on Automatic Speech Recognition Understanding . 2009

机译：基于空间的POMDP对话系统的退避动作选择
5. Improved Intention Discovery with Classified Emotions in A Modified POMDP-Based Dialogue System. [D] . Sriram, Sivaraman. 2012

机译：在改进的基于POMDP的对话系统中，通过分类情感改善了意图发现。
6. Modeling and Planning with Macro-Actions in Decentralized POMDPs [O] . Christopher Amato, George Konidaris, Leslie P. Kaelbling, -1

机译：在分散的POMDP中使用宏动作进行建模和计划
7. Back-off Action Selection in Summary Space-Based POMDP Dialogue Systems [O] . F. Lefèvre, F. Jurcicek, S. Keizer, 2010

机译：基于空间的pOmDp对话系统中的退避行动选择

Back-off action selection in summary space-based POMDP dialogue systems

摘要

著录项

相似文献

相关主题

期刊订阅