【24h】

Language Model Augmented Relevance Score

机译:语言模型增强相关性分数

获取原文

摘要

Although automated metrics are commonly used to evaluate NLG systems, they often correlate poorly with human judgements. Newer metrics such as BERTScore have addressed many weaknesses in prior metrics such as BLEU and ROUGE, which rely on n-gram matching. These newer methods, however, are still limited in that they do not consider the generation context, so they cannot properly reward generated text that is correct but deviates from the given reference. In this paper, we propose Language Model Augmented Relevance Score (MARS), a new context-aware metric for NLG evaluation. MARS leverages off-the-shelf language models, guided by reinforcement learning, to create augmented references that consider both the generation context and available human references, which are then used as additional references to score generated text. Compared with seven existing metrics in three common NLG tasks, MARS not only achieves higher correlation with human reference judgements, but also differentiates well-formed candidates from adversarial samples to a larger degree.
机译:虽然自动指标通常用于评估NLG系统,但它们通常与人类判断相相关。 Bertscore等较新的指标在依靠N-GRAM匹配的情况下解决了现有量级的许多弱点。然而,这些较新的方法仍然有限,因为它们不考虑生成上下文,因此它们无法正确奖励已正确但偏离给定参考的生成文本。在本文中,我们提出语言模型增强相关性得分(MARS),是NLG评估的新上下文知识度量。火星利用了钢筋学习的现成语言模型,以创建考虑代表上下文和可用人权的增强引用,然后将其作为额外的引用来获得分数生成的文本。与三个普通的NLG任务中的七个现有度量相比,火星不仅与人类参考判断较高,而且还将形成的候选良好的候选者与较大程度不同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号