首页> 外文期刊>Natural language engineering >How to evaluate machine translation: A review of automated and human metrics
【24h】

How to evaluate machine translation: A review of automated and human metrics

机译:如何评估机器翻译:自动化和人工指标的回顾

获取原文
获取原文并翻译 | 示例
           

摘要

This article presents the most up-to-date, influential automated, semiautomated and human metrics used to evaluate the quality of machine translation (MT) output and provides the necessary background for MT evaluation projects. Evaluation is, as repeatedly admitted, highly relevant for the improvement of MT. This article is divided into three parts: the first one is dedicated to automated metrics; the second, to human metrics; and the last, to the challenges posed by neural machine translation (NMT) regarding its evaluation. The first part includes reference translation-based metrics; confidence or quality estimation (QE) metrics, which are used as alternatives for quality assessment; and diagnostic evaluation based on linguistic checkpoints. Human evaluation metrics are classified according to the criterion of whether human judges directly express a so-called subjective evaluation judgment, such as 'good' or 'better than', or not, as is the case in error classification. The former methods are based on directly expressed judgment (DEJ); therefore, they are called 'DEJ-based evaluation methods', while the latter are called 'non-DEJ-based evaluation methods'. In the DEJ-based evaluation section, tasks such as fluency and adequacy annotation, ranking and direct assessment (DA) are presented, whereas in the non-DEJ-based evaluation section, tasks such as error classification and postediting are detailed, with definitions and guidelines, thus rendering this article a useful guide for evaluation projects. Following the detailed presentation of the previously mentioned metrics, the specificities of NMT are set forth along with suggestions for its evaluation, according to the latest studies. As human translators are the most adequate judges of the quality of a translation, emphasis is placed on the human metrics seen from a translator-judge perspective to provide useful methodology tools for interdisciplinary research groups that evaluate MT systems.
机译:本文介绍了用于评估机器翻译(MT)输出质量的最新,有影响力的自动化,半自动化和人工指标,并提供了MT评估项目的必要背景。正如反复承认的那样,评估与改善MT密切相关。本文分为三个部分:第一部分专门讨论自动化指标;第二部分专门讨论自动化指标。第二,以人为标准;最后是神经机器翻译(NMT)对其评估提出的挑战。第一部分包括基于参考翻译的指标;置信度或质量估计(QE)指标,用作质量评估的替代方法;基于语言检查点的诊断评估。根据人类法官是否直接表达所谓的主观评估判断(例如“好”或“优于”)的标准进行分类,就像错误分类一样。前一种方法基于直接表达的判断(DEJ);因此,它们被称为“基于DEJ的评估方法”,而后者被称为“基于非DEJ的评估方法”。在基于DEJ的评估部分中,提供了诸如流畅性和适当性注释,排名和直接评估(DA)之类的任务,而在基于非DEJ的评估部分中,则详细描述了诸如错误分类和张贴后的任务,定义和指南,因此使本文成为评估项目的有用指南。根据最新研究,在详细介绍了前面提到的指标之后,将阐述NMT的特殊性及其评估建议。由于人工翻译是翻译质量的最充分的判断者,因此重点放在从译者-​​法官的角度看待的人工指标,为评估MT系统的跨学科研究小组提供有用的方法学工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号