Language Model Augmented Relevance Score

机译：语言模型增强相关性分数

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Although automated metrics are commonly used to evaluate NLG systems, they often correlate poorly with human judgements. Newer metrics such as BERTScore have addressed many weaknesses in prior metrics such as BLEU and ROUGE, which rely on n-gram matching. These newer methods, however, are still limited in that they do not consider the generation context, so they cannot properly reward generated text that is correct but deviates from the given reference. In this paper, we propose Language Model Augmented Relevance Score (MARS), a new context-aware metric for NLG evaluation. MARS leverages off-the-shelf language models, guided by reinforcement learning, to create augmented references that consider both the generation context and available human references, which are then used as additional references to score generated text. Compared with seven existing metrics in three common NLG tasks, MARS not only achieves higher correlation with human reference judgements, but also differentiates well-formed candidates from adversarial samples to a larger degree.

机译：虽然自动指标通常用于评估NLG系统，但它们通常与人类判断相相关。 Bertscore等较新的指标在依靠N-GRAM匹配的情况下解决了现有量级的许多弱点。然而，这些较新的方法仍然有限，因为它们不考虑生成上下文，因此它们无法正确奖励已正确但偏离给定参考的生成文本。在本文中，我们提出语言模型增强相关性得分（MARS），是NLG评估的新上下文知识度量。火星利用了钢筋学习的现成语言模型，以创建考虑代表上下文和可用人权的增强引用，然后将其作为额外的引用来获得分数生成的文本。与三个普通的NLG任务中的七个现有度量相比，火星不仅与人类参考判断较高，而且还将形成的候选良好的候选者与较大程度不同。

著录项

来源
《International Joint Conference on Natural Language Processing;Annual Meeting of the Association for Computational Linguistics》|2021年|6677-6690|共14页
会议地点
作者
Ruibo Liu; Jason Wei; Soroush Vosoughi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Structured queries, language modeling, and relevance modeling in cross-language information retrieval [J] . Larkey LS, Connell ME Information Processing & Management . 2005,第3期

机译：跨语言信息检索中的结构化查询，语言建模和相关性建模
2. Augmented Sign Language Modeling(ASLM) with interaction design on smartphone - an assistive learning and communication tool for inclusive classroom [J] . Suman Deb, Suraksha, Paritosh Bhattacharya Procedia Computer Science . 2018,第1期

机译：智能手机上具有交互设计的增强型手语建模（ASLM）-包容性课堂的辅助学习和交流工具
3. Linguistically-augmented perplexity-based data selection for language models [J] . Antonio Toral, Pavel Pecina, Longyue Wang, Computer speech and language . 2015,第1期

机译：基于语言增强的困惑度的语言模型数据选择
4. Improving relevance feedback in language modeling with score regularization [C] . Fernando D. Diaz Annual international ACM SIGIR conference on Research and development in information retrieval;International ACM SIGIR conference on Research and development in information retrieval . 2008

机译：通过分数正则化改善语言建模中的相关性反馈
5. Modeling, Relevance in Statistical Machine Translation: Scoring Aligment, Context, and Annotations of Translation Instances. [D] . Phillips, Aaron B. 2012

机译：统计机器翻译中的建模，相关性：评分实例，上下文和翻译实例注释。
6. Model for End-stage Liver Disease excluding INR (MELD-XI) score in critically ill patients: Easily available and of prognostic relevance [O] . Bernhard Wernly, Michael Lichtenauer, Marcus Franz, -1

机译：重症患者不包括INR（MELD-XI）评分的终末期肝病模型：易于获得且具有预后相关性
7. Structured Queries, Language Modeling, and Relevance Modeling in Cross-Language Information Retrieval [O] . Leah S. Larkey, Margaret E. Connell 2003

机译：跨语言信息检索中的结构化查询，语言建模和相关性建模
8. PRIS at 2009 Relevance Feedback track: Experiments in Language Model for Relevance Feedback [R] . Li, S., Li, X., Zhang, H., 2009

机译：pRIs在2009年相关反馈轨道：相关反馈的语言模型实验

Language Model Augmented Relevance Score

摘要

著录项

相似文献

相关主题

期刊订阅