首页> 外文期刊>Computer speech and language >Phrase-boundary model for statistical machine translation
【24h】

Phrase-boundary model for statistical machine translation

机译:用于统计机器翻译的短语边界模型

获取原文
获取原文并翻译 | 示例
       

摘要

This paper proposes a new probabilistic synchronous context-free grammar model for statistical machine translation. The model labels nonterminals with classes of boundary words on the target side of aligned phrase pairs. Labeling of the rules is performed with coarse grained and fine grained nonterminals using POS tags and word clusters trained on the target language corpus. Considering the large size of the proposed model due to the diversity of nonterminals, we have also proposed a novel approach for filtered rule extraction based on the alignment pattern of phrase pairs. Using limited patterns of rules, the extraction of hierarchical rules gets restricted from phrase pairs that are decomposable to two aligned subphrases. The proposed filtered rule extraction decreases the model size and the decoding time considerably with no significant impact on the translation quality. Using BLEU as a metric in our experiments, the proposed model achieved a notable improvement rate over the state-of-the-art hierarchical phrase-based model in the translation from Persian, French and Spanish to English language. This is applicable for all languages, even under-resourced ones having no linguistic tools.
机译:本文提出了一种用于统计机器翻译的新概率同步上下文无关文法模型。该模型在对齐短语对的目标侧用边界词类标记非终结符。使用POS标签和在目标语言语料库上训练的单词簇,使用粗粒度和细粒度的非终结符来执行规则的标记。考虑到由于非终结点的多样性而导致的模型很大,我们还提出了一种基于短语对对齐模式的过滤规则提取新方法。使用有限的规则模式,可以从可分解为两个对齐的子短语的短语对中限制分层规则的提取。所提出的过滤规则提取大大减少了模型大小和解码时间,而对翻译质量没有明显影响。在我们的实验中,使用BLEU作为度量标准,在从波斯语,法语和西班牙语到英语的翻译中,所提出的模型相对于最新的基于层次短语的模型取得了显着的改善。这适用于所有语言,即使是资源匮乏的语言也没有语言工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号