Integrating a Large, Monolingual Corpus as Translation Memory into Statistical Machine Translation

机译：将一个大型单形语料库集成到翻译记忆库中的统计机器翻译

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Translation memories (TM) are widely used in the localization industry to improve consistency and speed of human translation. Several approaches have been presented to integrate the bilingual translation units of TMs into statistical machine translation (SMT). We present an extension of these approaches to the integration of partial matches found in a large, monolingual corpus in the target language, using cross-language information retrieval (CLIR) techniques. We use locality-sensitive hashing (LSH) for efficient coarse-grained retrieval of match candidates, which are then filtered by finegrained fuzzy matching, and finally used to re-rank the n-best SMT output. We show consistent and significant improvements over a state-of-the-art SMT system, across different domains and language pairs on tens of millions of sentences.

机译：翻译记忆（TM）广泛用于本地化行业，以提高人类翻译的一致性和速度。已经提出了几种方法，以将TM的双语翻译单位集成到统计机器翻译（SMT）中。我们使用跨语言信息检索（CLIR）技术，展示了这些方法的集成，以在目标语言中的大型单机语料库中发现的部分匹配。我们使用用于匹配候选的有效粗粒度的匹配检索的位置敏感散列（LSH），然后通过FINESGRATED模糊匹配来过滤，最后用于重新排列N最佳的SMT输出。我们通过在数千万句话的不同域和语言对中显示出一致而显着的改进。

著录项

来源
《Annual conference of European Association for Machine Translation》|2015年||共8页
会议地点
作者
Katharina Waeschle; Stefan Riezler;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Improving Structural Statistical Machine Translation for Sign Language With Small Corpus Using Thematic Role Templates as Translation Memory [J] . Su H.-Y., Wu C.-H. Audio, Speech, and Language Processing, IEEE Transactions on . 2009,第7期

机译：使用主题角色模板作为翻译记忆库改善小语料库的手语结构统计机器翻译
2. A unified framework and models for integrating translation memory into phrase-based statistical machine translation [J] . Yang Liu, Kun Wang, Chengqing Zong, Computer speech and language . 2019,第MARa期

机译：用于将翻译记忆库集成到基于短语的统计机器翻译中的统一框架和模型
3. Corrigendum to 'A unified framework and models for integrating translation memory into phrase-based statistical machine translation' [J] . Liu Yang, Wang Kun, Zong Chengqing, Computer speech and language . 2019,第MAY期

机译：“将翻译记忆库集成到基于短语的统计机器翻译中的统一框架和模型”的勘误
4. Integrating a Large, Monolingual Corpus as Translation Memory into Statistical Machine Translation [C] . Katharina Waeschle, Stefan Riezler 18th Annual conference of European Association for Machine Translation . 2015

机译：将大型单语语料库作为翻译记忆库整合到统计机器翻译中
5. Cohesion in translation: A corpus study of human-translated, machine-translated, and non-translated texts (Russian into English). [D] . Bystrova-McIntyre, Tatyana. 2012

机译：翻译中的衔接：对人工翻译，机器翻译和非翻译文本（俄语译成英语）的语料库研究。
6. Pseudotext Injection and Advance Filtering of Low-Resource Corpus for Neural Machine Translation [O] . Michael Adjeisah, Guohua Liu, Douglas Omwenga Nyabuga, 2021

机译：神经电机翻译低资源语料的假义注射和预先滤波
7. Dynamic Translation Memory: Using Statistical Machine Translation to improve Translation Memory Fuzzy Matches [O] . Ergun Biçici, Marc Dymetman 2010

机译：动态翻译记忆：利用统计机器翻译改进翻译记忆模糊匹配

Integrating a Large, Monolingual Corpus as Translation Memory into Statistical Machine Translation

摘要

著录项

相似文献

相关主题

期刊订阅