首页> 外文期刊>Machine translation >What types of word alignment improve statistical machine translation?
【24h】

What types of word alignment improve statistical machine translation?

机译:哪些类型的单词对齐可改善统计机器翻译?

获取原文
获取原文并翻译 | 示例
           

摘要

In most statistical machine translation (SMT) systems, bilingual segments are extracted via word alignment. However, there is a need for systematic study as to what alignment characteristics can benefit MT under specific experimental settings such as the type of MT system, the language pair or the type or size of the corpus. In this paper we perform, in each of these experimental settings, a statistical analysis of the data and study the sample correlation coefficients between a number of alignment or phrase table characteristics and variables such as the phrase table size, the number of untranslated words or the BLEU score. We report results for two different SMT systems (a phrase-based and an n-gram-based system) on Chinese-to-English FBIS and BTEC data, and Spanish-to-English European Parliament data. We find that the alignment characteristics which help in translation greatly depend on the MT system and on the corpus size. We give alignment hints to improve BLEU score, depending on the SMT system used and the type of corpus. For example, for phrase-based SMT, dense alignments are required with larger corpora, especially on the target side, while with smaller corpora, more precise, sparser alignments are better, especially on the source side. Avoiding some long-distance crossing links may also improve BLEU score with small corpora. We take these conclusions into account to modify two types of alignment systems, and get 1 to 1.6 % relative improvements in BLEU score on two held-out corpora, although the improved system is different in each corpus.
机译:在大多数统计机器翻译(SMT)系统中,双语段都是通过单词对齐来提取的。但是,需要系统地研究在特定的实验设置下,例如MT系统的类型,语言对或语料库的类型或大小,哪些对齐特征可以使MT受益。在本文中,我们将在每种实验设置下对数据进行统计分析,并研究一些对齐方式或词组表特征与变量之间的样本相关系数,例如词组表大小,未翻译的单词数或BLEU得分。我们报告了两种不同的SMT系统(基于短语的系统和基于n-gram的系统)的结果,这些结果基于中文到英文的FBIS和BTEC数据,以及西班牙语到英文的欧洲议会数据。我们发现,有助于翻译的对齐特征在很大程度上取决于MT系统和语料库大小。我们根据使用的SMT系统和语料库的类型给出对齐提示,以提高BLEU分数。例如,对于基于短语的SMT,需要较大的语料库(尤其是在目标侧)进行密集对齐,而对于较小的语料库,更精确的稀疏对齐会更好,尤其是在源语言侧。避免一些长距离的交叉链接也可以提高小型语料库的BLEU得分。我们考虑到了这些结论,以修改两种类型的对齐系统,并且在两个保留语料库中,BLEU得分相对提高了1%至1.6%,尽管每个语料库中改进的系统都不同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号