首页> 外文期刊>IEEE transactions on audio, speech and language processing >Using multiple edit distances to automatically grade outputs from Machine translation systems
【24h】

Using multiple edit distances to automatically grade outputs from Machine translation systems

机译:使用多个编辑距离自动对机器翻译系统的输出进行评分

获取原文
获取原文并翻译 | 示例

摘要

This paper addresses the challenging problem of automatically evaluating output from machine translation (MT) systems that are subsystems of speech-to-speech MT (SSMT) systems. Conventional automatic MT evaluation methods include BLEU, which MT researchers have frequently used. However, BLEU has two drawbacks in SSMT evaluation. First, BLEU assesses errors lightly at the beginning of translations and heavily in the middle, even though its assessments should be independent of position. Second, BLEU lacks tolerance in accepting colloquial sentences with small errors, although such errors do not prevent us from continuing an SSMT-mediated conversation. In this paper, the authors report a new evaluation method called "g Rader based on Edit Distances (RED)" that automatically grades each MT output by using a decision tree (DT). The DT is learned from training data that are encoded by using multiple edit distances, that is, normal edit distance (ED) defined by insertion, deletion, and replacement, as well as its extensions. The use of multiple edit distances allows more tolerance than either ED or BLEU. Each evaluated MT output is assigned a grade by using the DT. RED and BLEU were compared for the task of evaluating MT systems of varying quality on ATR's Basic Travel Expression Corpus (BTEC). Experimental results show that RED significantly outperforms BLEU.
机译:本文解决了具有挑战性的问题,即自动评估作为语音转语音MT(SSMT)系统子系统的机器翻译(MT)系统的输出。传统的MT自动评估方法包括BLEU,MT研究人员经常使用BLEU。但是,BLEU在SSMT评估中有两个缺点。首先,BLEU在翻译开始时会轻度评估错误,在翻译中会严重评估错误,即使评估应独立于立场也是如此。其次,BLEU在接受带有微小错误的口语句子时缺乏容忍度,尽管这种错误不会阻止我们继续SSMT介导的对话。在本文中,作者报告了一种称为“基于编辑距离(RED)的g Rader”的新评估方法,该方法通过使用决策树(DT)自动对每个MT输出进行评分。通过使用多个编辑距离(即,通过插入,删除和替换定义的正常编辑距离(ED)及其扩展名)编码的训练数据来学习DT。与ED或BLEU相比,使用多个编辑距离可提供更大的容差。使用DT为每个评估的MT输出分配一个等级。比较了RED和BLEU在ATR的基本旅行表达语料库(BTEC)上评估质量不同的MT系统的任务。实验结果表明,RED的性能明显优于BLEU。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号