首页> 外文会议>Conference on empirical methods in natural language processing >Variable-Length Word Encodings for Neural Translation Models
【24h】

Variable-Length Word Encodings for Neural Translation Models

机译:神经翻译模型的可变长度编码

获取原文

摘要

Recent work in neural machine translation has shown promising performance, but the most effective architectures do not scale naturally to large vocabulary sizes. We propose and compare three variable-length encoding schemes that represent a large vocabulary corpus using a much smaller vocabulary with no loss in information. Common words are unaffected by our encoding, but rare words are encoded using a sequence of two pseudo-words. Our method is simple and effective: it requires no complete dictionaries, learning procedures, increased training time, changes to the model, or new parameters. Compared to a baseline that replaces all rare words with an unknown word symbol, our best variable-length encoding strategy improves WMT English-French translation performance by up to 1.7 BLEU.
机译:神经机翻译中最近的工作表明表现明显,但最有效的架构不会自然地扩展到大型词汇大小。我们提出并比较了三种可变长度编码方案,其代表了一个使用更小的词汇表的大词汇表,其中没有信息没有损失。常用词不受我们的编码的影响,但是使用两种伪字的序列进行编码稀有的单词。我们的方法简单且有效:它不需要完整的词典,学习程序,增加培训时间,更改模型或新参数。与替换具有未知单词符号的所有稀有单词的基线相比,我们最佳的可变长度编码策略可提高WMT英语 - 法语翻译性能,最高可达1.7 BLEU。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号