Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation

机译：在彩票中失去头部：在神经机翻译中修剪变压器注意

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The attention mechanism is the crucial component of the transformer architecture. Recent research shows that most attention heads are not confident in their decisions and can be pruned after training. However, removing them before training a model results in lower quality. In this paper, we apply the lottery ticket hypothesis to prune heads in the early stages of training, instead of doing so on a fully converged model. Our experiments on machine translation show that it is possible to remove up to three-quarters of all attention heads from a transformer-big model with an average -0.1 change in BLEU for Turkish→English. The pruned model is 1.5 times as fast at inference, albeit at the cost of longer training. The method is complementary to other approaches, such as teacher-student, with our English→German student losing 0.2 BLEU at 75% encoder attention sparsity.

机译：注意机制是变压器架构的重要组成部分。最近的研究表明，大多数关注头在其决定中并不充满信心，并且可以在培训后修剪。但是，在培训模型之前删除它们导致质量较低。在本文中，我们将彩票票假设应用于培训的早期阶段的Prune头，而不是在完全融合的模型上这样做。我们对机器翻译的实验表明，可以从变压器 - 大型模型中删除高达四分之三的所有关注头，平均-0.1土耳其语→英语。 Pruned模型在推理时快3.5倍，尽管培训更长的成本。该方法与其他方法互补，如教师 - 学生，我们的英文→德国学生在75％的编码器注意稀疏度下丢失0.2 bleu。

著录项

来源
《Conference on Empirical Methods in Natural Language Processing》|2020年|2664-2674|共11页
会议地点
作者
Maximiliana Behnke; Kenneth Heafield;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Re-Transformer: A Self-Attention Based Model for Machine Translation [J] . Huey-Ing Liu, Wei-Lin Chen Procedia Computer Science . 2021,第a期

机译：重新变压器：基于自我关注的机器翻译模型
2. Improving Transformer-Based Neural Machine Translation with Prior Alignments [J] . Thien Nguyen, Lam Nguyen, Phuoc Tran, Complexity . 2021,第a期

机译：用现有对齐改善基于变压器的神经电机翻译
3. Attention over Heads: A Multi-Hop Attention for Neural Machine Translation [C] . Shohei Iida, Ryuichiro Kimura, Hongyi Cui, Annual meeting of the Association for Computational Linguistics . 2019

机译：头上的注意力：神经机器翻译的多跳注意力
4. Gibbs Pruning: A Framework for Structured and Unstructured Neural Network Pruning [D] . Labach, Alexander Jacob. 2020

机译：Gibbs修剪：结构化和非结构化神经网络修剪的框架
5. Neural Processing Underlying Executive Functions in Bilinguals: Heads I Win Tails You Lose [O] . Jesús Cespón 2021

机译：双语中的神经处理潜力职能：我赢了你输了尾巴
6. Every Layer Counts: Multi-Layer Multi-Head Attention for Neural Machine Translation [O] . Isaac Kojo Essel Ampomah, Sally McClean, Lin Zhiwei, 2020

机译：每层数量：神经机翻译多层多针注意

Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation

摘要

著录项

相似文献

相关主题

期刊订阅