首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Fast Decoding in Sequence Models Using Discrete Latent Variables
【24h】

Fast Decoding in Sequence Models Using Discrete Latent Variables

机译:使用离散潜变量在序列模型中快速解码

获取原文
           

摘要

Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and Transformer are the state-of-the-art on many tasks. However, they lack parallelism and are thus slow for long sequences. RNNs lack parallelism both during training and decoding, while architectures like WaveNet and Transformer are much more parallel during training, but still lack parallelism during decoding. We present a method to extend sequence models using discrete latent variables that makes decoding much more parallel. The main idea behind this approach is to first autoencode the target sequence into a shorter discrete latent sequence, which is generated autoregressively, and finally decode the full sequence from this shorter latent sequence in a parallel manner. To this end, we introduce a new method for constructing discrete latent variables and compare it with previously introduced methods. Finally, we verify that our model works on the task of neural machine translation, where our models are an order of magnitude faster than comparable autoregressive models and, while lower in BLEU than purely autoregressive models, better than previously proposed non-autogregressive translation.
机译:基于深度神经网络的自回归序列模型,如RNN,Wavenet和变压器是许多任务的最先进的。然而,它们缺乏平行性,因此对于长序列来说是缓慢的。在训练和解码期间,RNN缺乏并行性,而像波形和变压器这样的架构在训练期间更平行,但在解码期间仍然缺乏并行性。我们介绍了一种使用离散潜变量扩展序列模型的方法,使解码更加平行。这种方法背后的主要思想是首先将目标序列自动扩展到自动生成的较短的离散潜序列,并且最终以并行方式从该较短的潜在序列中解码全序列。为此,我们介绍了一种构建离散潜在变量的新方法,并将其与先前引入的方法进行比较。最后,我们验证了我们的模型对神经机翻译的任务工作,我们的型号比可比较的自动出口模型更快,而我们的型号比纯粹的自动增强模型更低,而不是纯粹的自动投资模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号