首页> 外文期刊>ACM transactions on Asian language information processing >Improved Chinese-English SMT with Chinese 'DE' Construction Classification and Reordering
【24h】

Improved Chinese-English SMT with Chinese 'DE' Construction Classification and Reordering

机译:具有中文“ DE”构造分类和重新排序的改进的汉英SMT

获取原文
获取原文并翻译 | 示例

摘要

Syntactic reordering on the source side has been demonstrated to be helpful and effective for handling different word orders between source and target languages in SMT. In this article, we focus on the Chinese (DE) construction which is flexible and ubiquitous in Chinese and has many different ways to be translated into English so that it is a major source of word order differences in terms of translation quality. This article carries out the Chinese "DE" construction study for Chinese-English SMT in which we propose a new classifier model-discriminative latent variable model (DPLVM)-with new features to improve the classification accuracy and indirectly improve the translation quality compared to a log-linear classifier. The DE classifier is used to recognize DE structures in both training and test sentences of Chinese, and then perform word reordering to make the Chinese sentences better match the word order of English. In order to investigate the impact of the DE classification and reordering in the source side on different types of SMT systems (namely PB-SMT, hierarchical PB-SMT (HPB-SMT) as well as the syntax-based SMT (SAMT)), we conduct a series of experiments on NIST 2005 and 2008 test sets to verify the effectiveness of our proposed model. The experimental results show that the MT systems using the data reordered by our proposed model outperform the baseline systems by 3.01% and 4.03% relative points on the NIST 2005 test set, 4.64% and 4.62% relative points on the NIST 2008 test set in terms of BLEU score for PB-SMT and HPB-SMT respectively. However, the DE classification method does not perform significantly well for SAMT. Additionally, we also conducted some experiments to evaluate our DE classification and reordering approach on the word alignment and phrase table in terms of these three types of SMT systems.
机译:事实证明,源代码方面的语法重新排序对于处理SMT中源语言和目标语言之间的不同字序是有帮助和有效的。在本文中,我们重点介绍中文(DE)的构造,该构造灵活,中文无处不在,并具有多种不同的翻译成英文的方法,因此它是翻译质量方面字序差异的主要来源。本文进行了针对汉英SMT的中文“ DE”构造研究,在该研究中,我们提出了一种新的分类器模型-判别性潜在变量模型(DPLVM),该新模型具有提高分类准确度和间接改善翻译质量的新功能。对数线性分类器。 DE分类器用于识别中文的训练和测试句子中的DE结构,然后执行单词重排以使中文句子更好地匹配英语的单词顺序。为了调查DE分类和重新排序在源端对不同类型的SMT系统(即PB-SMT,分层PB-SMT(HPB-SMT)以及基于语法的SMT(SAMT))的影响,我们在NIST 2005和2008测试集上进行了一系列实验,以验证我们提出的模型的有效性。实验结果表明,使用我们提出的模型重新排序的数据的MT系统在NIST 2005测试集上的相对点优于基线系统,相对于NIST 2005测试集上的相对点分别为3.01%和4.03%,在NIST 2008测试集上的相对点优于4.64%和4.62% PB-SMT和HPB-SMT的BLEU得分。但是,DE分类方法对于SAMT效果不佳。此外,我们还进行了一些实验,以根据这三种类型的SMT系统对单词对齐和短语表评估我们的DE分类和重新排序方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号