首页> 外文会议>Automatic Speech Recognition amp; Understanding, 2009. ASRU 2009 >Back-off action selection in summary space-based POMDP dialogue systems
【24h】

Back-off action selection in summary space-based POMDP dialogue systems

机译:基于摘要的基于空间的POMDP对话系统中的退避动作选择

获取原文

摘要

This paper deals with the issue of invalid state-action pairs in the Partially Observable Markov Decision Process (POMDP) framework, with a focus on real-world tasks where the need for approximate solutions exacerbates this problem. In particular, when modelling dialogue as a POMDP, both the state and the action space must be reduced to smaller scale summary spaces in order to make learning tractable. However, since not all actions are valid in all states, the action proposed by the policy in summary space sometimes leads to an invalid action when mapped back to master space. Some form of back-off scheme must then be used to generate an alternative action. This paper demonstrates how the value function derived during reinforcement learning can be used to order back-off actions in an N-best list. Compared to a simple baseline back-off strategy and to a strategy that extends the summary space to minimise the occurrence of invalid actions, the proposed N-best action selection scheme is shown to be significantly more robust.
机译:本文在部分可观察的马尔可夫决策过程(POMDP)框架中处理无效的状态-动作对问题,重点关注现实世界中的任务,在这些任务中,近似解决方案的需求加剧了这一问题。特别是,在将对话建模为POMDP时,必须将状态和动作空间都缩小为较小的摘要空间,以使学习变得容易。但是,由于并非所有动作在所有状态下都有效,因此策略在摘要空间中建议的动作在映射回主空间时有时会导致无效的动作。然后必须使用某种形式的退避方案来生成替代动作。本文演示了如何将在强化学习过程中得出的价值函数用于对N个最佳列表中的退避动作进行排序。与简单的基准退避策略以及扩展摘要空间以最大程度地减少无效操作的策略相比,建议的N最佳操作选择方案显示出明显更强的鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号