...
首页> 外文期刊>Computer speech and language >Estimating post-editing time using a gold-standard set of machine translation errors
【24h】

Estimating post-editing time using a gold-standard set of machine translation errors

机译:使用一套金标准的机器翻译错误来估计后期编辑时间

获取原文
获取原文并翻译 | 示例
           

摘要

With the improved quality of Machine Translation (MT) systems in the last decades, post-editing (the correction of MT errors) has gained importance in Computer-Assisted Translation (CAT) workflows. Depending on the number and the severity of the errors in the MT output, the effort required to post-edit varies from sentence to sentence. The existing Quality Estimation (QE) systems provide quality scores that reflect the quality of an MT output at sentence level or word level. However, they fail to explain the relationship between different types of MT errors and the required post-editing effort to correct them. We suggest a more informative approach to QE in which different types of MT errors are detected in a first step, which are then used to estimate post-editing effort in a second step. In this paper we define the upper boundary of such a system. We use different machine learning methods to estimate Post-Editing Time (PET) by using a gold-standard set of MT errors as features. We show that post-editing time can be estimated with high accuracy when all the translation errors in the MT output are known. Furthermore, we apply feature selection methods and investigate the predictive power of different MT error types on PET. Our results show that the same prediction performance can be achieved by only using a small subset of MT error types, indicating that successful two-step QE systems can be built with less effort in the future, by detecting only the error types with highest predictive power. (C) 2018 Published by Elsevier Ltd.
机译:随着最近几十年来机器翻译(MT)系统质量的提高,后期编辑(纠正MT错误)在计算机辅助翻译(CAT)工作流程中变得越来越重要。根据MT输出中错误的数量和严重性,后期编辑所需的工作因句子而异。现有的质量评估(QE)系统提供的质量得分反映了句子级别或单词级别的MT输出质量。但是,它们无法解释不同类型的MT错误与纠正错误所需的后期编辑工作之间的关系。我们建议采用一种更具信息量的QE方法,其中在第一步中检测到不同类型的MT错误,然后在第二步中将其用于估计后期编辑工作。在本文中,我们定义了这种系统的上限。我们通过使用一组标准的MT错误作为特征来使用不同的机器学习方法来估计后期编辑时间(PET)。我们表明,当MT输出中的所有翻译错误都已知时,可以以较高的精度估算后期编辑时间。此外,我们应用特征选择方法并研究了不同MT错误类型对PET的预测能力。我们的结果表明,仅使用一小部分MT错误类型就可以实现相同的预测性能,这表明通过仅检测具有最高预测能力的错误类型,将来可以用更少的精力构建成功的两步QE系统。 (C)2018由Elsevier Ltd.发布

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号