Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

机译：分析多头自我注意：专业头做重物，其余部分可以修剪

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multi-head self-attention is a key component of the Transformer, a state-of-the-art architecture for neural machine translation. In this work we evaluate the contribution made by individual attention heads in the encoder to the overall performance of the model and analyze the roles played by them. We find that the most important and confident heads play consistent and often linguistically-interpretable roles. When pruning heads using a method based on stochastic gates and a differentiable relaxation of the L_(0) penalty, we observe that specialized heads are last to be pruned. Our novel pruning method removes the vast majority of heads without seriously affecting performance. For example, on the English-Russian WMT dataset, pruning 38 out of 48 encoder heads results in a drop of only 0.15 BLEU.

机译：多头自我注意是Transformer（神经机器翻译的最新架构）的关键组成部分。在这项工作中，我们评估了编码器中各个关注头对模型整体性能的贡献，并分析了它们所扮演的角色。我们发现，最重要和最自信的负责人起着一致的作用，并且通常在语言上可以解释。当使用基于随机门和L_（0）罚分的微分松弛的方法修剪磁头时，我们观察到最后要修剪专用磁头。我们新颖的修剪方法可去除绝大多数刀头，而不会严重影响性能。例如，在英俄WMT数据集上，修剪48个编码器头中的38个导致下降仅0.15 BLEU。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|5797-5808|共12页
会议地点
作者
Elena Voita; David Talbot; Fedor Moiseev; Rico Sennrich; Ivan Titov;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Multi-Head Self-Attention Transformation Networks for Aspect-Based Sentiment Analysis [J] . Yuming Lin, Chaoqiang Wang, Hao Song, Quality Control, Transactions . 2021,第1期

机译：基于宽高的情感分析的多头自我关注转换网络
2. Joint extraction of entities and relations based on character graph convolutional network and Multi-Head Self-Attention Mechanism [J] . Meng Zhao, Tian Shengwei, Yu Long, Journal of Experimental and Theoretical Artificial Intelligence . 2021,第2期

机译：基于角色图卷积网络和多头自我关注机制的实体和关系联合提取
3. Using recurrent neural network structure with Enhanced Multi-Head Self-Attention for sentiment analysis [J] . Leng Xue-Liang, Miao Xiao-Ai, Liu Tao Multimedia Tools and Applications . 2021,第8期

机译：采用经常性神经网络结构，具有增强的多头自我关注情绪分析
4. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned [C] . Elena Voita, David Talbot, Fedor Moiseev, Annual meeting of the Association for Computational Linguistics . 2019

机译：分析多头自我关注：专业头部进行沉重的升降，剩下的静止
5. Lift Smart, Lift Heavy [D] . Ansorge, Julie. 2020

机译：提升智能，举重
6. DM3Loc: multi-label mRNA subcellular localization prediction and analysis based on multi-head self-attention mechanism [O] . Duolin Wang, Zhaoyue Zhang, Yuexu Jiang, 2021

机译：DM3LOC：基于多头自我关注机制的多标签mRNA亚细胞定位预测和分析
7. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned [O] . Elena Voita, David Talbot, Fedor Moiseev, 2019

机译：分析多头自我关注：专业头部进行沉重的升降，剩下的静止

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

摘要

著录项

相似文献

相关主题

期刊订阅