首页> 外文会议>Annual meeting of the Association for Computational Linguistics >MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility via semantic frames
【24h】

MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility via semantic frames

机译:意思是:一种廉价,高精度,半自动度量,通过语义帧评估翻译实用程序

获取原文

摘要

We introduce a novel semi-automated metric, MEANT, that assesses translation utility by matching semantic role fillers, producing scores that correlate with human judgment as well as HTER but at much lower labor cost. As machine translation systems improve in lexical choice and fluency, the shortcomings of widespread n-gram based, fluency-oriented MT evaluation metrics such as BLEU, which fail to properly evaluate adequacy, become more apparent. But more accurate, non-automatic adequacy-oriented MT evaluation metrics like HTER are highly labor-intensive, which bottlenecks the evaluation cycle. We first show that when using untrained monolingual readers to annotate semantic roles in MT output, the non-automatic version of the metric HMEANT achieves a 0.43 correlation coefficient with human adequacy judgments at the sentence level, far superior to BLEU at only 0.20, and equal to the far more expensive HTER. We then replace the human semantic role annotators with automatic shallow semantic parsing to further automate the evaluation metric, and show that even the semi-automated evaluation metric achieves a 0.34 correlation coefficient with human adequacy judgment, which is still about 80% as closely correlated as HTER despite an even lower labor cost for the evaluation procedure. The results show that our proposed metric is significantly better correlated with human judgment on adequacy than current widespread automatic evaluation metrics, while being much more cost effective than HTER.
机译:我们介绍了一种新型半自动度量,意思是,通过匹配语义角色填充物来评估翻译效用,产生与人类判断相关的分数以及HERT,但劳动力成本低得多。由于机器翻译系统在词汇选择和流畅性方面,基于普遍的N-GRAM的流畅性的MT评估指标等缺点,如BLEU,这未能妥善评估充分性,变得更加明显。但更准确,非自动充足的MT评估指标,如HETER是高度劳动密集型的,哪个瓶颈评估周期。我们首先表明,当使用未经训练的单晶读者注释在Mt输出中的语义作用时,公制Hmeant的非自动版本实现了0.43个相关系数,在句子水平上具有人类充足的判断,远远超过0.20,而且相等到目前为止更昂贵。然后,我们用自动浅语义解析替换人类语义角色注释器,以进一步自动化评估度量,表明即使半自动评估度量达到了0.34的相关系数,具有人为充分判断,仍然与之密切相关的仍然大约80%尽管甚至降低了评估程序的劳动力成本。结果表明,我们的拟议度量与人类判断有明显更好地相关,而不是当前广泛的自动评估指标,同时比HTER更具成本效益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号