首页> 外文期刊>Computer speech and language >Translating without in-domain corpus: Machine translation post-editing with online learning techniques
【24h】

Translating without in-domain corpus: Machine translation post-editing with online learning techniques

机译:没有域内语料库的翻译:使用在线学习技术进行机器翻译后编辑

获取原文
获取原文并翻译 | 示例

摘要

Globalization has dramatically increased the need of translating information from one language to another. Frequently, such translation needs should be satisfied under very tight time constraints. Machine translation (MT) techniques can constitute a solution to this overly complex problem. However, the documents to be translated in real scenarios are often limited to a specific domain, such as a particular type of medical or legal text. This situation seriously hinders the applicability of MT, since it is usually expensive to build a reliable translation system, no matter what technology is used, due to the linguistic resources that are required to build them, such as dictionaries, translation memories or parallel texts. In order to solve this problem, we propose the application of automatic post-editing in an online learning framework. Our proposed technique allows the human expert to translate in a specific domain by using a base translation system designed to work in a general domain whose output is corrected (or adapted to the specific domain) by means of an automatic post-editing module. This automatic post-editing module learns to make its corrections from user feedback in real time by means of online learning techniques. We have validated our system using different translation technologies to implement the base translation system, as well as several texts involving different domains and languages. In most cases, our results show significant improvements in terms of BLEU (up to 16 points) with respect to the baseline systems. The proposed technique works effectively when the n-grams of the document to be translated presents a certain rate of repetition, situation which is common according to the document-internal repetition property.
机译:全球化极大地增加了将信息从一种语言翻译成另一种语言的需求。通常,应在非常严格的时间限制下满足此类翻译需求。机器翻译(MT)技术可以解决这个过于复杂的问题。但是,实际情况下要翻译的文档通常仅限于特定领域,例如特定类型的医学或法律文本。这种情况严重阻碍了MT的应用,因为无论使用什么技术,构建可靠的翻译系统通常都很昂贵,这归因于构建它们所需的语言资源,例如字典,翻译记忆库或平行文本。为了解决这个问题,我们提出了自动后期编辑在在线学习框架中的应用。我们提出的技术允许人类专家使用基础翻译系统在特定领域中进行翻译,该基础翻译系统设计为在通用领域中工作,该基础领域的输出通过自动后期编辑模块进行了纠正(或适应于特定领域)。这个自动的后期编辑模块通过在线学习技术学习实时根据用户反馈进行更正。我们已经使用不同的翻译技术和基础语言系统以及几种涉及不同领域和语言的文本来验证我们的系统。在大多数情况下,我们的结果表明,相对于基准系统,BLEU显着提高(最高16分)。当要翻译的文档的n-gram呈现一定的重复率时,所提出的技术可以有效地工作,这种情况根据文档内部的重复性是常见的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号