【24h】

Macro-Average: Rare Types Are Important Too

机译:宏观平均值:罕见类型也很重要

获取原文

摘要

While traditional corpus-level evaluation metrics for machine translation (MT) correlate well with fluency, they struggle to reflect adequacy. Model-based MT metrics trained on segment-level human judgments have emerged as an attractive replacement due to strong correlation results. These models, however, require potentially expensive re-training for new domains and languages. Furthermore, their decisions are inherently non-transparent and appear to reflect unwelcome biases. We explore the simple type-based classifier metric, MACROF_1, and study its applicability to MT evaluation. We find that MACROF_1 is competitive on direct assessment, and outperforms others in indicating downstream cross-lingual information retrieval task performance. Further, we show that MACROF_1 can be used to effectively compare supervised and unsuper-vised neural machine translation, and reveal significant qualitative differences in the methods' outputs.
机译:虽然机器翻译(MT)的传统语料库级评估指标与流畅性相互关联,但他们努力反映了充足的程度。 由于强烈的相关结果,基于模型的MT指标培训的分段级别人类判断是一种有吸引力的替代品。 然而,这些模型需要对新域和语言进行潜在的昂贵重新培训。 此外,他们的决定本质上是非透明的,似乎反映了不受欢迎的偏见。 我们探索简单的基于类型的分类器度量标准,MacROF_1,并研究其对MT评估的适用性。 我们发现Macrof_1在直接评估方面具有竞争力,并且在指示下游交叉信息检索任务性能方面优于其他人。 此外,我们表明Macrof_1可用于有效地比较监督和无保卫的神经机器翻译,并揭示了方法输出的显着定性差异。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号