首页> 外文会议>Workshop on Asian translation >Translation of Patent Sentences with a Large Vocabulary of Technical Terms Using Neural Machine Translation
【24h】

Translation of Patent Sentences with a Large Vocabulary of Technical Terms Using Neural Machine Translation

机译:专利句的翻译,使用神经机翻译,具有大词汇表的技术术语

获取原文

摘要

Neural machine translation (NMT), a new approach to machine translation, has achieved promising results comparable to those of traditional approaches such as statistical machine translation (SMT). Despite its recent success, NMT cannot handle a larger vocabulary because training complexity and decoding complexity proportionally increase with the number of target words. This problem becomes even more serious when translating patent documents, which contain many technical terms that are observed infrequently. In NMTs, words that are out of vocabulary are represented by a single unknown token. In this paper, we propose a method that enables NMT to translate patent sentences comprising a large vocabulary of technical terms. We train an NMT system on bilingual data wherein technical terms are replaced with technical term tokens; this allows it to translate most of the source sentences except technical terms. Further, we use it as a decoder to translate source sentences with technical term tokens and replace the tokens with technical term translations using SMT. We also use it to rerank the 1,000-best SMT translations on the basis of the average of the SMT score and that of the NMT rescoring of the translated sentences with technical term tokens. Our experiments on Japanese-Chinese patent sentences show that the proposed NMT system achieves a substantial improvement of up to 3.1 BLEU points and 2.3 RIBES points over traditional SMT systems and an improvement of approximately 0.6 BLEU points and 0.8 RIBES points over an equivalent NMT system without our proposed technique.
机译:神经电脑翻译(NMT)是一种机器翻译方法,已经取得了有希望的结果与统计机器翻译(SMT)等传统方法相当。尽管最近的成功,但NMT无法处理更大的词汇,因为培训复杂性和解码复杂性与目标单词的数量成比例地增加。在翻译专利文献时,这个问题变得更加严重,其中包含很少观察到的许多技术术语。在NMT中,从词汇表中的单词由一个未知的令牌表示。在本文中,我们提出了一种方法,使NMT能够翻译包含大型技术术语的专利句。我们在双语数据中培训NMT系统,其中技术术语用技术术语令牌替换;这允许它转换除技术术语之外的大多数源句子。此外,我们将其作为解码器用技术术语令牌翻译源码句子,并使用SMT使用技术术语翻译替换令牌。我们还将其利用它在SMT评分的平均值的基础上重新划分了1,000个最佳的SMT翻译,以及使用技术术语令牌的翻译句子的NMT救助。我们对日文专利句子的实验表明,所提出的NMT系统实现了高达3.1个BLEU点和2.3系数,在传统的SMT系统上具有2.3个肋条,并在没有相同的NMT系统上提高约0.6的BLEU点和0.8肋条点。没有我们提出的技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号