首页> 外文会议>Conference on Empirical Methods in Natural Language Processing >Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation
【24h】

Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation

机译:非自动报道神经机翻译的连续空间迭代精制

获取原文

摘要

We propose an efficient inference procedure for non-autoregressive machine translation that iteratively refines translation purely in the continuous space. Given a continuous latent variable model for machine translation (Shu et al., 2020), we train an inference network to approximate the gradient of the marginal log probability of the target sentence, using only the latent variable as input. This allows us to use gradient-based optimization to find the target sentence at inference time that approximately maximizes its marginal probability. As each refinement step only involves computation in the latent space of low dimensionality (we use 8 in our experiments), we avoid computational overhead incurred by existing non-autoregressive inference procedures that often refine in token space. We compare our approach to a recently proposed EM-like inference procedure (Shu et al., 2020) that optimizes in a hybrid space, consisting of both discrete and continuous variables. We evaluate our approach on WMT' 14 En→De, WMT' 16 Ro→En and IWSLT' 16 De→En, and observe two advantages over the EM-like inference: (1) it is computationally efficient, i.e. each refinement step is twice as fast, and (2) it is more effective, resulting in higher marginal probabilities and BLEU scores with the same number of refinement steps. On WMT' 14 En→De, for instance, our approach is able to decode 6.2 times faster than the autoregressive model with minimal degradation to translation quality (0.9 BLEU).
机译:我们建议对非自回归机器翻译的有效推理的过程,反复提炼翻译纯粹的连续空间。给定机器翻译连续潜变量模型(Shu等人,2020),我们培养的推断网络来近似目标句子的边缘数概率的梯度,仅使用潜在变量作为输入。这使我们能够使用基于梯度的优化,以找到在这个近似最大化它的边际概率推理时的目标判决。由于每个细化步骤只涉及低维的潜在空间计算(我们在我们的实验中使用8),我们避免现有非自回归推断过程,往往细化令牌空间所产生的计算开销。我们我们的做法比较最近提出的EM样的推理过程(舒等人,2020年),优化的混合空间,包括离散和连续变量。 (1)它是计算效率,即每个细化的步骤是:我们在EM般推理评估我们对WMT“14恩→德,WMT” 16滚装→恩和IWSLT” 16德→恩的做法,并观察两个优势快两倍,和(2)它是更有效的,从而导致较高的边际概率和得分BLEU具有相同数目的细化步骤。在WMT” 14恩→德,例如,我们的做法是能够比最低的功能退化对翻译质量(0.9 BLEU)的自回归模型解码快6.2倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号