Fast Decoding in Sequence Models Using Discrete Latent Variables

Lukasz Kaiser; Samy Bengio; Aurko Roy; Ashish Vaswani; Niki Parmar; Jakob Uszkoreit; Noam Shazeer

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Fast Decoding in Sequence Models Using Discrete Latent Variables

【24h】

Fast Decoding in Sequence Models Using Discrete Latent Variables

机译：使用离散潜变量在序列模型中快速解码

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and Transformer are the state-of-the-art on many tasks. However, they lack parallelism and are thus slow for long sequences. RNNs lack parallelism both during training and decoding, while architectures like WaveNet and Transformer are much more parallel during training, but still lack parallelism during decoding. We present a method to extend sequence models using discrete latent variables that makes decoding much more parallel. The main idea behind this approach is to first autoencode the target sequence into a shorter discrete latent sequence, which is generated autoregressively, and finally decode the full sequence from this shorter latent sequence in a parallel manner. To this end, we introduce a new method for constructing discrete latent variables and compare it with previously introduced methods. Finally, we verify that our model works on the task of neural machine translation, where our models are an order of magnitude faster than comparable autoregressive models and, while lower in BLEU than purely autoregressive models, better than previously proposed non-autogregressive translation.

机译：基于深度神经网络的自回归序列模型，如RNN，Wavenet和变压器是许多任务的最先进的。然而，它们缺乏平行性，因此对于长序列来说是缓慢的。在训练和解码期间，RNN缺乏并行性，而像波形和变压器这样的架构在训练期间更平行，但在解码期间仍然缺乏并行性。我们介绍了一种使用离散潜变量扩展序列模型的方法，使解码更加平行。这种方法背后的主要思想是首先将目标序列自动扩展到自动生成的较短的离散潜序列，并且最终以并行方式从该较短的潜在序列中解码全序列。为此，我们介绍了一种构建离散潜在变量的新方法，并将其与先前引入的方法进行比较。最后，我们验证了我们的模型对神经机翻译的任务工作，我们的型号比可比较的自动出口模型更快，而我们的型号比纯粹的自动增强模型更低，而不是纯粹的自动投资模型。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2010期|共10页
作者
Lukasz Kaiser; Samy Bengio; Aurko Roy; Ashish Vaswani; Niki Parmar; Jakob Uszkoreit; Noam Shazeer;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. The Psychometric Latent Agreement Model (PLAM) for Discrete Latent Variables Measured by Multiple Items [J] . Levent Dumenci Organizational Research Methods . 2011,第1期

机译：多个项目测量的离散潜在变量的心理潜能协议模型（PLAN）
2. Smooth, identifiable supermodels of discrete DAG models with latent variables [J] . Evans Robin J., Richardson Thomas S. Bernoulli: official journal of the Bernoulli Society for Mathematical Statistics and Probability . 2019,第2期

机译：具有潜在变量的离散DAG模型的光滑，可识别的超级典范
3. Bayesian first order auto-regressive latent variable models for multiple binary sequences [J] . Giardina F., Guglielmi A., Quintana F.A., Statistical modeling: applications in contemporary issues . 2011,第6期

机译：用于多个二进制序列的贝叶斯一阶自回归潜在变量模型
4. Fast length-constrained MAP decoding of variable length coded Markov sequences over noisy channel [C] . Wang, Z., Wu, . 2004

机译：噪声信道上可变长度编码马尔可夫序列的快速长度受限MAP解码
5. Bayesian Latent Variable Models for Discrete Choice Data [D] . Yu, Qiushi. 2021

机译：离散选择数据的贝叶斯潜在变量模型
6. The Psychometric Latent Agreement Model (PLAM) for Discrete Latent Variables Measured by Multiple Items [O] . Levent Dumenci -1

机译：多个项目测量的离散潜变量的心理差异达成协议模型（计划）
7. Exact Decoding on Latent Variable Conditional Models is NP-Hard [O] . Sun, Xu 2014

机译：潜在变量条件模型的精确解码是Np-Hard

Fast Decoding in Sequence Models Using Discrete Latent Variables

摘要

著录项

相似文献

相关主题

期刊订阅