首页> 外文会议>18th Annual conference of European Association for Machine Translation >Integrating a Large, Monolingual Corpus as Translation Memory into Statistical Machine Translation
【24h】

Integrating a Large, Monolingual Corpus as Translation Memory into Statistical Machine Translation

机译:将大型单语语料库作为翻译记忆库整合到统计机器翻译中

获取原文
获取原文并翻译 | 示例

摘要

Translation memories (TM) are widely used in the localization industry to improve consistency and speed of human translation. Several approaches have been presented to integrate the bilingual translation units of TMs into statistical machine translation (SMT). We present an extension of these approaches to the integration of partial matches found in a large, monolingual corpus in the target language, using cross-language information retrieval (CLIR) techniques. We use locality-sensitive hashing (LSH) for efficient coarse-grained retrieval of match candidates, which are then filtered by finegrained fuzzy matching, and finally used to re-rank the n-best SMT output. We show consistent and significant improvements over a state-of-the-art SMT system, across different domains and language pairs on tens of millions of sentences.
机译:翻译记忆库(TM)在本地化行业中广泛使用,以提高人工翻译的一致性和速度。已经提出了几种将TM的双语翻译单元集成到统计机器翻译(SMT)中的方法。我们使用跨语言信息检索(CLIR)技术,提供了这些方法的扩展,以集成在目标语言的大型单语语料库中发现的部分匹配。我们使用局部敏感哈希(LSH)进行匹配候选的高效粗粒度检索,然后通过细粒度模糊匹配对其进行过滤,最后用于对n个最佳SMT输出进行重新排序。我们在数以千万计的句子的不同领域和语言对上,通过最先进的SMT系统显示出一致而重大的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号