首页> 外文会议>Conference on Empirical Methods in Natural Language Processing >Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation
【24h】

Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation

机译:在彩票中失去头部:在神经机翻译中修剪变压器注意

获取原文

摘要

The attention mechanism is the crucial component of the transformer architecture. Recent research shows that most attention heads are not confident in their decisions and can be pruned after training. However, removing them before training a model results in lower quality. In this paper, we apply the lottery ticket hypothesis to prune heads in the early stages of training, instead of doing so on a fully converged model. Our experiments on machine translation show that it is possible to remove up to three-quarters of all attention heads from a transformer-big model with an average -0.1 change in BLEU for Turkish→English. The pruned model is 1.5 times as fast at inference, albeit at the cost of longer training. The method is complementary to other approaches, such as teacher-student, with our English→German student losing 0.2 BLEU at 75% encoder attention sparsity.
机译:注意机制是变压器架构的重要组成部分。最近的研究表明,大多数关注头在其决定中并不充满信心,并且可以在培训后修剪。但是,在培训模型之前删除它们导致质量较低。在本文中,我们将彩票票假设应用于培训的早期阶段的Prune头,而不是在完全融合的模型上这样做。我们对机器翻译的实验表明,可以从变压器 - 大型模型中删除高达四分之三的所有关注头,平均-0.1土耳其语→英语。 Pruned模型在推理时快3.5倍,尽管培训更长的成本。该方法与其他方法互补,如教师 - 学生,我们的英文→德国学生在75%的编码器注意稀疏度下丢失0.2 bleu。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号