首页> 外文期刊>ACM transactions on Asian and low-resource language information processing >Uniformly Interpolated Balancing for Robust Prediction in Translation Quality Estimation: A Case Study of English-Korean Translation
【24h】

Uniformly Interpolated Balancing for Robust Prediction in Translation Quality Estimation: A Case Study of English-Korean Translation

机译:翻译质量估算中的鲁棒预测均匀插值平衡:以英语翻译为例

获取原文
获取原文并翻译 | 示例

摘要

There has been growing interest among researchers in quality estimation (QE), which attempts to automatically predict the quality of machine translation (MT) outputs. Most existing works on QE are based on supervised approaches using quality-annotated training data. However, QE training data quality scores readily become imbalanced or skewed: QE data are mostly composed of high translation quality sentence pairs but the data lack low translation quality sentence pairs. The use of imbalanced data with an induced quality estimator tends to produce biased translation quality scores with "high" translation quality scores assigned even to poorly translated sentences. To address the data imbalance, this article proposes a simple, efficient procedure called uniformly interpolated balancing to construct more balanced QE training data by inserting greater uniformness to training data. The proposed uniformly interpolated balancing procedure is based on the preparation of two different types of manually annotated QE data: (1) default skewed data and (2) near-uniform data. First, we obtain default skewed data in a naive manner without considering the imbalance by manually annotating qualities on MT outputs. Second, we obtain near-uniform data in a selective manner by manually annotating a subset only, which is selected from the automatically quality-estimated sentence pairs. Finally, we create uniformly interpolated balanced data by combining these two types of data, where one half originates from the default skewed data and the other half originates from the near-uniform data. We expect that uniformly interpolated balancing reflects the intrinsic skewness of the true quality distribution and manages the imbalance problem. Experimental results on an English-Korean quality estimation task show that the proposed uniformly interpolated balancing leads to robustness on both skewed and uniformly distributed quality test sets when compared to the test sets of other non-balanced datasets.
机译:在质量估计(QE)的研究人员之间存在兴趣,试图自动预测机器翻译质量(MT)输出。 QE上的大多数现有工程都是基于使用质量注释的培训数据的监督方法。但是,QE训练数据质量分数很容易变得不平衡或歪斜:QE数据主要由高翻译质量句对组成,但数据缺少低翻译质量句子对。使用具有诱导质量估计的不平衡数据倾向于产生偏差的翻译质量分数,即使翻译不良的句子也会产生“高”翻译质量分数。为了解决数据不平衡,本文提出了一种简单,高效的过程,称为均匀插值平衡,通过将更大的均匀性插入培训数据来构建更平衡的QE培训数据。所提出的统一内插平衡程序基于编写两种不同类型的手动注释的QE数据:(1)默认偏斜数据和(2)近均匀数据。首先,我们以天真的方式获得默认偏移数据,而不考虑通过手动注释MT输出的质量的不平衡。其次,我们仅通过手动注释一个选择的子集以选择方式获得近似均匀的数据,这些数据仅从自动质量估计的句子对中选择。最后,我们通过组合这两种类型的数据来创建统一的内插平衡数据,其中一半源自默认偏移数据,另一半源自均来自近似均匀的数据。我们预计均匀插值的平衡反映了真正质量分布的内在偏斜,并管理不平衡问题。与其他非平衡数据集的测试集相比,对英语韩国质量估计任务的实验结果表明,建议的均匀插值平衡导致围绕偏斜和均匀分布式的质量测试集的鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号