【24h】

Extraction of Bilingual Technical Terms for Chinese-Japanese Patent Translation

机译:中日专利翻译双语技术术语摘录

获取原文

摘要

The translation of patents or scientific papers is a key issue that should be helped by the use of statistical machine translation (SMT). In this paper, we propose a method to improve Chinese-Japanese patent SMT by pre-marking the training corpus with aligned bilingual multi-word terms. We automatically extract multi-word terms from monolingual corpora by combining statistical and linguistic filtering methods. We use the sampling-based alignment method to identify aligned terms and set some threshold on translation probabilities to select the most promising bilingual multi-word terms. We pre-mark a Chinese-Japanese training corpus with such selected aligned bilingual multi-word terms. We obtain the performance of over 70% precision in bilingual term extraction and a significant improvement of BLEU scores in our experiments on a Chinese-Japanese patent parallel corpus.
机译:专利或科学论文的翻译是一个关键问题,应通过使用统计机器翻译(SMT)加以帮助。在本文中,我们提出了一种通过用对齐的双语多词术语预先标记训练语料来改善中日专利SMT的方法。我们通过结合统计和语言过滤方法,从单语语料库中自动提取多词术语。我们使用基于采样的对齐方法来识别对齐的术语,并为翻译概率设置一些阈值,以选择最有希望的双语多词术语。我们使用这样选择的对齐的双语多词术语为中日培训语料库预先标记。在我们对中日专利平行语料库的实验中,我们在双语术语提取中获得了70%以上的精度,并显着提高了BLEU分数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号