首页> 外文期刊>ACM transactions on Asian language information processing >Bilingually Motivated Word Segmentation for Statistical Machine Translation
【24h】

Bilingually Motivated Word Segmentation for Statistical Machine Translation

机译:统计机器翻译的双语动机分词

获取原文
获取原文并翻译 | 示例

摘要

We introduce a bilingually motivated word segmentation approach to languages where word boundaries are not orthographically marked, with application to Phrase-Based Statistical Machine Translation (PB-SMT). Our approach is motivated from the insight that PB-SMT systems can be improved by optimizing the input representation to reduce the predictive power of translation models. We firstly present an approach to optimize the existing segmentation of both source and target languages for PB-SMT and demonstrate the effectiveness of this approach using a Chinese-English MT task, that is, to measure the influence of the segmentation on the performance of PB-SMT systems. We report a 5.44% relative increase in Bleu score and a consistent increase according to other metrics. We then generalize this method for Chinese word segmentation without relying on any segmenters and show that using our segmentation PB-SMT can achieve more consistent state-of-the-art performance across two domains. There are two main advantages of our approach. First of all, it is adapted to the specific translation task at hand by taking the corresponding source (target) language into account. Second, this approach does not rely on manually segmented training data so that it can be automatically adapted for different domains.
机译:我们针对没有字词边界的语言引入了双语动机的分词方法,并将其应用于基于短语的统计机器翻译(PB-SMT)。我们的方法基于这样的见解,即可以通过优化输入表示以降低翻译模型的预测能力来改进PB-SMT系统。我们首先提出一种方法来优化PB-SMT的源语言和目标语言的现有细分,并使用中英文MT任务演示该方法的有效性,即测量该细分对PB性能的影响-SMT系统。我们报告了Bleu评分相对增加了5.44%,并且根据其他指标也持续增加。然后,我们将这种方法推广到中文分词而不依赖任何分词器,并证明使用我们的分词PB-SMT可以在两个域上实现更一致的最新性能。我们的方法有两个主要优点。首先,通过考虑相应的源(目标)语言,使其适应于手头的特定翻译任务。其次,这种方法不依赖于手动分割的训练数据,因此可以针对不同的领域自动进行调整。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号