Neural Machine Translation of Rare Words with Subword Units

机译：具有子词单位的稀有词的神经机器翻译

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem. Previous work addresses the translation of out-of-vocabulary words by backing off to a dictionary. In this paper, we introduce a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units. This is based on the intuition that various word classes are translatable via smaller units than words, for instance names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). We discuss the suitability of different word segmentation techniques, including simple character 71-gram models and a segmentation based on the byte pair encoding compression algorithm, and empirically show that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English→German and English→Russian by up to 1.1 and 1.3 Bleu, respectively.

机译：神经机器翻译（NMT）模型通常以固定的词汇量运行，但是翻译是一个开放词汇的问题。先前的工作是通过退回到字典来解决词汇外单词的翻译。在本文中，我们介绍了一种更简单，更有效的方法，通过将稀有和未知词编码为子词单元序列，使NMT模型能够进行词汇翻译。这是基于这样的直觉，即可以通过比单词小的单位来翻译各种单词类别，例如名称（通过字符复制或音译），复合词（通过组成翻译）以及同源词和借词（通过语音和词法转换）。我们讨论了不同的分词技术的适用性，包括简单字符71语法模型和基于字节对编码压缩算法的分词，并通过经验证明子词模型在WMT 15翻译任务的基础上比后退字典基线有所改进。 →德语和英语→俄语，最多分别达到1.1和1.3 Bleu。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2016年|1715-1725|共11页
会议地点
作者
Rico Sennrich; Barry Haddow; Alexandra Birch;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Finding Better Subwords for Tibetan Neural Machine Translation [J] . Li Yachao, Jiang Jing, Jia Yangji, ACM transactions on Asian and low-resource language information processing . 2021,第2期

机译：为西藏神经机翻译找到更好的次字
2. A Hierarchical Clustering Approach to Fuzzy Semantic Representation of Rare Words in Neural Machine Translation [J] . Yang Muyun, Liu Shujie, Chen Kehai, IEEE Transactions on Fuzzy Systems . 2020,第5期

机译：神经机翻译中稀有词模糊语义表示的分层聚类方法
3. Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation [J] . Gronroos Stig-Arne, Virpioja Sami, Kurimo Mikko Machine translation . 2020,第4期

机译：转移学习和子字抽样对非对称资源一对多神经翻译
4. Neural Machine Translation of Rare Words with Subword Units [C] . Rico Sennrich, Barry Haddow, Alexandra Birch Annual meeting of the Association for Computational Linguistics . 2016

机译：用语单位稀有单词的神经机翻译
5. Evolving neural net circuit modules to detect characters of the alphabet and sequences of characters (words) using the cellular automata module-brain machine. [D] . DeCesare, Derek. 2001

机译：不断发展的神经网络电路模块，使用元胞自动机模块-大脑机器来检测字母字符和字符序列（单词）。
6. An ensemble of neural models for nested adverse drug events and medication extraction with subwords [O] . Meizhi Ju, Nhung T H Nguyen, Makoto Miwa, 2020

机译：神经模型的集成用于嵌套不良药物事件和带有子词的药物提取
7. Neural Machine Translation of Rare Words with Subword Units [O] . Sennrich, Rico, Haddow, Barry, Birch, Alexandra 2016

机译：用字词单位神经机器翻译稀有词
8. Modeling words with subword units in an articulatorily constrained speech recognition algorithm [R] . Hogden, J. 1997

机译：在语音约束语音识别算法中用子词单元建模单词

Neural Machine Translation of Rare Words with Subword Units

摘要

著录项

相似文献

相关主题

期刊订阅