首页> 外文会议>AAAI Conference on Artificial Intelligence >Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning
【24h】

Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning

机译:对话一代:从模仿学习到逆钢筋学习

获取原文

摘要

The performance of adversarial dialogue generation models relies on the quality of the reward signal produced by the discriminator. The reward signal from a poor discriminator can be very sparse and unstable, which may lead the generator to fall into a local optimum or to produce nonsense replies. To alleviate the first problem, we first extend a recently proposed adversarial dialogue generation method to an adversarial imitation learning solution. Then, in the framework of adversarial inverse reinforcement learning, we propose a new reward model for dialogue generation that can provide a more accurate and precise reward signal for generator training. We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.
机译:对抗对话生成模型的性能依赖于鉴别器产生的奖励信号的质量。 来自差别判别器的奖励信号可以非常稀疏和不稳定,这可能导致发电机落入本地最佳或产生废话的回复。 为了缓解第一个问题,我们首先将最近提出的对抗对话发电方法扩展到对抗仿制学习解决方案。 然后,在对抗逆钢筋学习的框架中,我们提出了一种用于对话生成的新奖励模型,可以为发电机训练提供更准确和精确的奖励信号。 我们在两个注释设置中评估生成模型的性能和自动指标和人为评估。 我们的实验结果表明,我们的模型可以产生更高质量的响应,并实现比最先进的整体性能更高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号