首页> 外文会议>Workshop on neural generation and translation >On the Importance of Word Boundaries in Character-level Neural Machine Translation
【24h】

On the Importance of Word Boundaries in Character-level Neural Machine Translation

机译:字符级神经机器翻译中单词边界的重要性

获取原文

摘要

Neural Machine Translation (NMT) models generally perform translation using a fixed-size lexical vocabulary, which is an important bottleneck on their generalization capability and overall translation quality. The standard approach to overcome this limitation is to segment words into subword units, typically using some external tools with arbitrary heuristics, resulting in vocabulary units not optimized for the translation task. Recent studies have shown that the same approach can be extended to perform NMT directly at the level of characters, which can deliver translation accuracy on-par with subword-based models, on the other hand, this requires relatively deeper networks. In this paper, we propose a more computationally-efficient solution for character-level NMT which implements a hierarchical decoding architecture where translations are subsequently generated at the level of words and characters. We evaluate different methods for open-vocabulary NMT in the machine translation task from English into five languages with distinct morphological typology, and show that the hierarchical decoding model can reach higher translation accuracy than the subword-level NMT model using significantly fewer parameters, while demonstrating better capacity in learning longer-distance contextual and grammatical dependencies than the standard character-level NMT model.
机译:神经机器翻译(NMT)模型通常使用固定大小的词汇表词汇进行翻译,这是其泛化能力和整体翻译质量的重要瓶颈。克服此限制的标准方法是将单词分割为子单词单元,通常使用一些具有任意试探法的外部工具,从而导致词汇单元未针对翻译任务进行优化。最近的研究表明,可以将相同的方法扩展为直接在字符级别执行NMT,这可以与基于子词的模型提供同等的翻译准确性,另一方面,这需要相对较深的网络。在本文中,我们为字符级NMT提出了一种计算效率更高的解决方案,该解决方案实现了分级解码架构,该架构随后在单词和字符级别生成翻译。我们评估了从英语到五种具有不同形态学类型的语言的机器翻译任务中开放词汇NMT的不同方法,并证明了分层解码模型比使用少得多的参数的子词级NMT模型可以达到更高的翻译准确度,同时证明了与标准字符级NMT模型相比,在学习长距离上下文和语法依存关系方面具有更好的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号