首页> 外文会议>Annual conference of European Association for Machine Translation >Integrating a Large, Monolingual Corpus as Translation Memory into Statistical Machine Translation
【24h】

Integrating a Large, Monolingual Corpus as Translation Memory into Statistical Machine Translation

机译:将一个大型单形语料库集成到翻译记忆库中的统计机器翻译

获取原文

摘要

Translation memories (TM) are widely used in the localization industry to improve consistency and speed of human translation. Several approaches have been presented to integrate the bilingual translation units of TMs into statistical machine translation (SMT). We present an extension of these approaches to the integration of partial matches found in a large, monolingual corpus in the target language, using cross-language information retrieval (CLIR) techniques. We use locality-sensitive hashing (LSH) for efficient coarse-grained retrieval of match candidates, which are then filtered by finegrained fuzzy matching, and finally used to re-rank the n-best SMT output. We show consistent and significant improvements over a state-of-the-art SMT system, across different domains and language pairs on tens of millions of sentences.
机译:翻译记忆(TM)广泛用于本地化行业,以提高人类翻译的一致性和速度。已经提出了几种方法,以将TM的双语翻译单位集成到统计机器翻译(SMT)中。我们使用跨语言信息检索(CLIR)技术,展示了这些方法的集成,以在目标语言中的大型单机语料库中发现的部分匹配。我们使用用于匹配候选的有效粗粒度的匹配检索的位置敏感散列(LSH),然后通过FINESGRATED模糊匹配来过滤,最后用于重新排列N最佳的SMT输出。我们通过在数千万句话的不同域和语言对中显示出一致而显着的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号