How to evaluate machine translation: A review of automated and human metrics

首页> 外文期刊>Natural language engineering >How to evaluate machine translation: A review of automated and human metrics

【24h】

How to evaluate machine translation: A review of automated and human metrics

机译：如何评估机器翻译：自动化和人工指标的回顾

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This article presents the most up-to-date, influential automated, semiautomated and human metrics used to evaluate the quality of machine translation (MT) output and provides the necessary background for MT evaluation projects. Evaluation is, as repeatedly admitted, highly relevant for the improvement of MT. This article is divided into three parts: the first one is dedicated to automated metrics; the second, to human metrics; and the last, to the challenges posed by neural machine translation (NMT) regarding its evaluation. The first part includes reference translation-based metrics; confidence or quality estimation (QE) metrics, which are used as alternatives for quality assessment; and diagnostic evaluation based on linguistic checkpoints. Human evaluation metrics are classified according to the criterion of whether human judges directly express a so-called subjective evaluation judgment, such as 'good' or 'better than', or not, as is the case in error classification. The former methods are based on directly expressed judgment (DEJ); therefore, they are called 'DEJ-based evaluation methods', while the latter are called 'non-DEJ-based evaluation methods'. In the DEJ-based evaluation section, tasks such as fluency and adequacy annotation, ranking and direct assessment (DA) are presented, whereas in the non-DEJ-based evaluation section, tasks such as error classification and postediting are detailed, with definitions and guidelines, thus rendering this article a useful guide for evaluation projects. Following the detailed presentation of the previously mentioned metrics, the specificities of NMT are set forth along with suggestions for its evaluation, according to the latest studies. As human translators are the most adequate judges of the quality of a translation, emphasis is placed on the human metrics seen from a translator-judge perspective to provide useful methodology tools for interdisciplinary research groups that evaluate MT systems.

机译：本文介绍了用于评估机器翻译（MT）输出质量的最新，有影响力的自动化，半自动化和人工指标，并提供了MT评估项目的必要背景。正如反复承认的那样，评估与改善MT密切相关。本文分为三个部分：第一部分专门讨论自动化指标;第二部分专门讨论自动化指标。第二，以人为标准;最后是神经机器翻译（NMT）对其评估提出的挑战。第一部分包括基于参考翻译的指标;置信度或质量估计（QE）指标，用作质量评估的替代方法;基于语言检查点的诊断评估。根据人类法官是否直接表达所谓的主观评估判断（例如“好”或“优于”）的标准进行分类，就像错误分类一样。前一种方法基于直接表达的判断（DEJ）;因此，它们被称为“基于DEJ的评估方法”，而后者被称为“基于非DEJ的评估方法”。在基于DEJ的评估部分中，提供了诸如流畅性和适当性注释，排名和直接评估（DA）之类的任务，而在基于非DEJ的评估部分中，则详细描述了诸如错误分类和张贴后的任务，定义和指南，因此使本文成为评估项目的有用指南。根据最新研究，在详细介绍了前面提到的指标之后，将阐述NMT的特殊性及其评估建议。由于人工翻译是翻译质量的最充分的判断者，因此重点放在从译者-法官的角度看待的人工指标，为评估MT系统的跨学科研究小组提供有用的方法学工具。

著录项

来源
《Natural language engineering》 |2020年第2期|137-161|共25页
作者

展开▼
作者单位

Institute de Literatura y Ciencias del Lenguaje Pontificia Universidad Cat61ica de Valparaiso Av. El Bosque 1290 Vina del Mar Chile;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Machine translation; Machine translation evaluation; Human metrics; Automated metrics; Machine translation quality;

机译：机器翻译;机器翻译评估;人工指标;自动化指标;机器翻译质量;

相似文献

外文文献
中文文献
专利

1. Introduction to the special issue on 'Automated Metrics for Machine Translation Evaluation' [J] . Alon Lavie, Mark Przybocki Machine translation . 2009,第2a3期

机译：关于“机器翻译评估的自动度量”特刊的介绍
2. Evaluation of 2-way Iraqi Arabic-English speech translation systems using automated metrics [J] . Sherri Condon, Mark Arehart, Dan Parvaz, Machine translation . 2012,第1a2期

机译：使用自动化指标评估伊拉克双向阿拉伯语-英语语音翻译系统
3. STD: An Automatic Evaluation Metric for Machine Translation Based on Word Embeddings [J] . Li Pairui, Chen Chuan, Zheng Wujie, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019,第10期

机译：STD：基于词嵌入的机器翻译自动评估指标
4. A fuzzier approach to machine translation evaluation: A pilot study on post-editing productivity and automated metrics in commercial settings [C] . Carla Parra Escartin, Manuel Arcedillo Fourth workshop on hybrid approaches to translation 2015 . 2015

机译：机器翻译评估的模糊方法：商业环境中的编辑后生产率和自动化指标的初步研究
5. An investigation of the relationship between automated Machine Translation Evaluation metrics and user performance on an information extraction task. [D] . Tate, Calandra Rilette. 2007

机译：对自动机器翻译评估指标与信息提取任务上的用户性能之间的关系的调查。
6. Automated mechanical cardiopulmonary resuscitation devices versus manual chest compressions in the treatment of cardiac arrest: protocol of a systematic review and meta-analysis comparing machine to human [O] . Manuel Obermaier, Johannes B Zimmermann, Erik Popp, 2021

机译：自动机械心肺复苏装置与手动胸部压缩治疗心脏骤停：系统审查协议和荟萃分析比较机
7. A fuzzier approach to machine translation evaluation: A pilot study on post-editing productivity and automated metrics in commercial settings [O] . Carla Parra Escartín, Manuel Arcedillo 2015

机译：机器翻译评估的模糊方法：商业环境中编辑生产率和自动指标的试验研究

How to evaluate machine translation: A review of automated and human metrics

摘要

著录项

相似文献

相关主题

期刊订阅