首页> 外文期刊>ACM transactions on Asian language information processing >Integrating Shallow Syntactic Labels in the Phrase-Boundary Translation Model
【24h】

Integrating Shallow Syntactic Labels in the Phrase-Boundary Translation Model

机译:在短语边界翻译模型中集成浅句法标签

获取原文
获取原文并翻译 | 示例
       

摘要

Using a novel rule labeling method, this article proposes a hierarchical model for statistical machine translation. The proposed model labels translation rules by matching the boundaries of target side phrases with the shallow syntactic labels including POS tags and chunk labels on the target side of the training corpus. The boundary labels are concatenated if there is no label for the whole target span. Labeling with the classes of boundary words on the target side phrases has been previously proposed as a phrase-boundary model which can be considered as the base form of our model. In the extended model, the labeler uses a POS tag if there is no chunk label in one boundary. Using chunks as phrase labels, the proposed model generalizes the rules to decrease the model sparseness. The sparseness is a more important issue in the language pairs with a lot of differences in the word order because they have less number of aligned phrase pairs for extraction of rules. The extended phrase-boundary model is also applicable for low-resource languages having no syntactic parser. Some experiments are performed with the proposed model, the base phrase-boundary model, and variants of Syntax Augmented Machine Translation (SAMT) in translation from Persian and German to English as source and target languages with different word orders. According to the results, the proposed model improves the translation performance in the quality and decoding time aspects. Using BLEU as our metric, the proposed model has achieved a statistically significant improvement of about 0.5 point over the base phrase-boundary model.
机译:使用一种新颖的规则标记方法,本文提出了一种用于统计机器翻译的层次模型。所提出的模型通过将目标附带短语的边界与浅语法标记(包括POS标签和训练语料库的目标侧面的块标记)进行匹配来标记翻译规则。如果没有整个目标范围的标签,则将边界标签串联起来。先前已经提出了在目标副词上使用边界词的类别进行标记的做法,作为词组边界模型,可以将其视为我们模型的基本形式。在扩展模型中,如果在一个边界中没有块标签,则贴标机将使用POS标签。提出的模型使用大块作为短语标签,概括了规则以减少模型的稀疏性。稀疏是语言对中一个更重要的问题,因为它们的单词顺序差异很大,因为它们用于规则提取的对齐短语对数量较少。扩展的词组边界模型也适用于没有语法分析器的低资源语言。使用建议的模型,基本短语边界模型以及语法增强机器翻译(SAMT)的变体(从波斯语和德语到英语作为源语言和目标语言,使用了不同的词序)进行了一些实验。根据结果​​,提出的模型在质量和解码时间方面提高了翻译性能。使用BLEU作为我们的度量标准,提出的模型在统计上比基础短语边界模型提高了约0.5个百分点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号