首页> 外文期刊>ACM transactions on Asian language information processing >Phrase Table Induction Using Monolingual Data for Low-Resource Statistical Machine Translation
【24h】

Phrase Table Induction Using Monolingual Data for Low-Resource Statistical Machine Translation

机译:使用单语数据进行短语表归纳以进行低资源统计机器翻译

获取原文
获取原文并翻译 | 示例
       

摘要

We propose a new method for inducing a phrase-based translation model from a pair of unrelated monolingual corpora. Our method is able to deal with phrases of arbitrary length and to find phrase pairs that are useful for statistical machine translation, without requiring large parallel or comparable corpora. First, our method generates phrase pairs through coupling source and target phrases separately collected from respective monolingual data. Then, for each phrase pair, we compute features using the monolingual data and a small quantity of parallel sentences. Finally, incorrect phrase pairs are pruned, and a phrase table is made using the remaining phrase pairs. In our experiments on French-Japanese and Spanish-Japanese translation tasks under low-resource conditions, we observe that incorporating a phrase table induced by our method to the machine translation system leads to large improvements in translation quality. Furthermore, we show that a phrase table induced by our method can also be useful in a wide range of configurations, including configurations where we have already access to large parallel corpora and configurations where only small monolingual corpora are available.
机译:我们提出了一种从一对不相关的单语语料库中引入基于短语的翻译模型的新方法。我们的方法能够处理任意长度的短语,并找到可用于统计机器翻译的短语对,而无需大型并行或可比语料库。首先,我们的方法通过耦合从相应的单语数据中分别收集的源短语和目标短语来生成短语对。然后,对于每个短语对,我们使用单语数据和少量并行句子来计算特征。最后,修剪不正确的短语对,并使用剩余的短语对制作短语表。在我们在资源匮乏的条件下进行的法语-日语和西班牙语-日语翻译任务的实验中,我们观察到将由我们的方法诱发的词组表合并到机器翻译系统中可以大大提高翻译质量。此外,我们表明,由我们的方法得出的词组表在多种配置中也很有用,包括我们已经可以访问大型并行语料库的配置和只有小型单语种语料库可用的配置。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号