首页> 外文会议>International Conference on Computational Linguistics >A Learning-Exploring Method to Generate Diverse Paraphrases with Multi-Objective Deep Reinforcement Learning
【24h】

A Learning-Exploring Method to Generate Diverse Paraphrases with Multi-Objective Deep Reinforcement Learning

机译:一种学习探索方法,以产生多目标深度加强学习的多样化释义

获取原文
获取外文期刊封面目录资料

摘要

Paraphrase generation (PG) is of great importance to many downstream tasks in natural language processing. Diversity is an essential nature to PG for enhancing generalization capability and robustness of downstream applications. Recently, neural sequence-to-sequence (Seq2Seq) models have shown promising results in PG. However, traditional model training for PG focuses on optimizing model prediction against single reference and employs cross-entropy loss, which objective is unable to encourage model to generate diverse paraphrases. In this work, we present a novel approach with multi-objective learning to PG. We propose a learning-exploring method to generate sentences as learning objectives from the learned data distribution, and employ reinforcement learning to combine these new learning objectives for model training. We first design a sample-based algorithm to explore diverse sentences. Then we introduce several reward functions to evaluate the sampled sentences as learning signals in terms of expressive diversity and semantic fidelity, aiming to generate diverse and high-quality paraphrases. To effectively optimize model performance satisfying different evaluating aspects, we use a GradNorm-based algorithm that automatically balances these training objectives. Experiments and analyses on Quora and Twitter datasets demonstrate that our proposed method not only gains a significant increase in diversity but also improves generation quality over several state-of-the-art baselines.
机译:释义生成(PG)对自然语言处理中的许多下游任务非常重要。多样性是PG的基本性,用于提高下游应用的泛化能力和鲁棒性。最近,神经序列到序列(SEQ2Seq)模型已经显示出PG的有希望的结果。然而,PG的传统模型培训专注于优化模型预测对单引用并采用跨熵损失,目标无法鼓励模型产生多样化的释义。在这项工作中,我们提出了一种具有多目标学习的新方法。我们提出了一种学习探索方法,从学习的数据分发中生成句子,并采用加强学习来结合这些新的学习目标进行模型培训。我们首先设计一种基于样本的算法来探索各种句子。然后,我们介绍了几种奖励功能,以评估采样的句子作为表现因素和语义忠诚的学习信号,旨在产生多样化和高质量的释义。为了有效地优化满足不同评估方面的模型性能,我们使用基于Gradnorm的算法,自动平衡这些培训目标。 Quora和Twitter数据集的实验和分析表明,我们的提出方法不仅提高了多样性的显着增加,而且还提高了几个最先进的基线的发电质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号