...
首页> 外文期刊>ACM transactions on Asian language information processing >Matching Graph, a Method for Extracting Parallel Information from Comparable Corpora
【24h】

Matching Graph, a Method for Extracting Parallel Information from Comparable Corpora

机译:匹配图,一种从可比语料库中提取并行信息的方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Comparable corpora are valuable alternatives for the expensive parallel corpora. They comprise informative parallel fragments that are useful resources for different natural language processing tasks. In this work, a generative model is proposed for efficient extraction of parallel fragments from a pair of comparable documents. The core of the proposed model is a graph called the Matching Graph. The ability of the Matching Graph to be trained on a small initial seed makes it a proper model for language pairs suffering from the scarce resource problem. Experiments show that the Matching Graph performs significantly better than other recently published models. According to the experiments on English-Persian and Arabic-Persian language pairs, the extracted parallel fragments can be used instead of parallel data for training statistical machine translation systems. Results reveal that the extracted fragments in the best case are able to retrieve about 90% of the information of a statistical machine translation system that is trained on a parallel corpus. Moreover, it is shown that using the extracted fragments as additional information for training statistical machine translation systems leads to an improvement of about 2% for English-Persian and about 1% for Arabic-Persian translation on BLEU score.
机译:可比语料库是昂贵的并行语料库的宝贵替代品。它们包含有用的并行片段,这些片段对于不同的自然语言处理任务是有用的资源。在这项工作中,提出了一种生成模型,用于从一对可比较的文件中有效提取平行片段。所提出模型的核心是称为匹配图的图。匹配图可以在很小的初始种子上进行训练的能力使其成为遭受稀缺资源问题的语言对的合适模型。实验表明,匹配图的性能明显优于其他最近发布的模型。根据英语-波斯语和阿拉伯语-波斯语对的实验,提取的并行片段可以代替并行数据用于训练统计机器翻译系统。结果表明,在最佳情况下,提取的片段能够检索在并行语料库上训练的统计机器翻译系统的信息的大约90%。此外,已表明,将提取的片段用作训练统计机器翻译系统的附加信息,可以使BLEU分数的英语-波斯语翻译提高约2%,对于阿拉伯语-波斯语翻译提高约1%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号