首页> 外文期刊>Computer speech and language >Reinforcement-learning based dialogue system for human-robot interactions with socially-inspired rewards
【24h】

Reinforcement-learning based dialogue system for human-robot interactions with socially-inspired rewards

机译:基于强化学习的对话系统,用于人机交互,具有社会启发性的奖励

获取原文
获取原文并翻译 | 示例

摘要

This paper investigates some conditions under which polarized user appraisals gathered throughout the course of a vocal interaction between a machine and a human can be integrated in a reinforcement learning-based dialogue manager. More specifically, we discuss how this information can be cast into socially-inspired rewards for speeding up the policy optimisation for both efficient task completion and user adaptation in an online learning setting. For this purpose a potential-based reward shaping method is combined with a sample efficient reinforcement learning algorithm to offer a principled framework to cope with these potentially noisy interim rewards. The proposed scheme will greatly facilitate the system's development by allowing the designer to teach his system through explicit positiveegative feedbacks given as hints about task progress, in the early stage of training. At a later stage, the approach will be used as a way to ease the adaptation of the dialogue policy to specific user profiles. Experiments carried out using a state-of-the-art goal-oriented dialogue management framework, the Hidden Information State (HIS), support our claims in two configurations: firstly, with a user simulator in the tourist information domain (and thus simulated appraisals), and secondly, in the context of man-robot dialogue with real user trials.
机译:本文研究了一些条件,在这种条件下,可以将机器和人类之间语音交互过程中收集的两极化的用户评估整合到基于增强学习的对话管理器中。更具体地说,我们讨论了如何将这些信息转化为社会启发的奖励,以加快在线学习环境中有效完成任务和用户适应的政策优化速度。为此,将基于势能的奖励整形方法与有效的样本强化学习算法相结合,以提供一个有原则的框架来应对这些潜在的嘈杂的临时奖励。通过允许设计人员在培训的早期阶段,通过明确的正面/负面反馈来教他的系统,作为对任务进度的提示,该提议的方案将极大地促进系统的开发。在稍后的阶段,该方法将用作简化对话策略适应特定用户配置文件的方式。使用最新的面向目标的对话管理框架,即“隐藏信息状态”(HIS)进行的实验以两种配置支持了我们的主张:首先,在旅游信息领域中使用了用户模拟器(从而进行了模拟评估) ),其次是在人机对话与真实用户试用的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号