首页> 外文会议>Conference on empirical methods in natural language processing >Variable-Length Word Encodings for Neural Translation Models
【24h】

Variable-Length Word Encodings for Neural Translation Models

机译:神经翻译模型的可变长度单词编码

获取原文

摘要

Recent work in neural machine translation has shown promising performance, but the most effective architectures do not scale naturally to large vocabulary sizes. We propose and compare three variable-length encoding schemes that represent a large vocabulary corpus using a much smaller vocabulary with no loss in information. Common words are unaffected by our encoding, but rare words are encoded using a sequence of two pseudo-words. Our method is simple and effective: it requires no complete dictionaries, learning procedures, increased training time, changes to the model, or new parameters. Compared to a baseline that replaces all rare words with an unknown word symbol, our best variable-length encoding strategy improves WMT English-French translation performance by up to 1.7 BLEU.
机译:神经机器翻译的最新工作已显示出令人鼓舞的性能,但是最有效的体系结构并不能自然扩展到大词汇量。我们提出并比较了三种变长编码方案,它们使用较小的词汇表述了一个大型词汇语料库,而没有信息丢失。普通字不受我们的编码的影响,但稀有字是使用两个伪字序列编码的。我们的方法简单有效:它不需要完整的词典,学习程序,增加的培训时间,更改模型或新的参数。与使用未知词符号替换所有稀有词的基准相比,我们最好的可变长度编码策略将WMT英法翻译性能提高了1.7 BLEU。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号