首页> 外文会议> >Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue
【24h】

Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue

机译:作为交流游戏的推荐:针对目标的对话的自我监督机器人游戏

获取原文

摘要

Traditional recommendation systems produce static rather than interactive recommendations invariant to a user's specific requests, clarifications, or current mood, and can suffer from the cold-start problem if their tastes are unknown. These issues can be alleviated by treating recommendation as an interactive dialogue task instead, where an expert recom-mender can sequentially ask about someone's preferences, react to their requests, and recommend more appropriate items. In this work, we collect a goal-driven recommendation dialogue dataset (GoRecDial). which consists of 9,125 dialogue games and 81.260 conversation turns between pairs of human workers recommending movies to each other. The task is specifically designed as a cooperative game between two players working towards a quantifiable common goal. We leverage the dataset to develop an end-to-end dialogue system that can simultaneously converse and recommend. Models are first trained to imitate the behavior of human players without considering the task goal itself (supervised training). We then fine-tune our models on simulated bot-bot conversations between two paired pre-trained models (bot-play), in order to achieve the dialogue goal. Our experiments show that models fine-tuned with bot-play leam improved dialogue strategies, reach the dialogue goal more often when paired with a human, and are rated as more consistent by humans compared to models trained without bot-play. The dataset and code are publicly available through the Parl AI framework~1.
机译:传统的推荐系统会根据用户的特定请求,说明或当前心情产生静态的而非交互式的推荐,并且如果口味未知,则会遭受冷启动问题的困扰。通过将推荐视为交互式对话任务,可以缓解这些问题,专家推荐者可以依次询问某人的喜好,对他们的要求做出反应并推荐更合适的项目。在这项工作中,我们收集了一个目标驱动的推荐对话数据集(GoRecDial)。它由9,125个对白游戏和81.260个对话轮流组成,彼此之间互相推荐电影。该任务经过专门设计,是两个致力于达成可量化共同目标的玩家之间的合作游戏。我们利用数据集来开发可以同时交谈和推荐的端到端对话系统。首先训练模型来模仿人类玩家的行为,而不考虑任务目标本身(监督训练)。然后,我们在两个配对的预训练模型(机器人游戏)之间的模拟机器人对话中对模型进行微调,以实现对话目标。我们的实验表明,与无人玩耍训练的模型相比,用无人玩耍的游戏模型微调的模型改善了对话策略,与人配对时更频繁地达到对话目标,并且被人评为与游戏人保持一致的方式。数据集和代码可通过Parl AI Framework〜1公开获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号