【24h】

Revealing the Dark Secrets of BERT

机译:揭示BERT的黑暗秘密

获取原文

摘要

BERT-based architectures currently give state-of-the-art performance on many NLP tasks, but little is known about the exact mechanisms that contribute to its success. In the current work, we focus on the interpretation of self-attention, which is one of the fundamental underlying components of BERT. Using a subset of GLUE tasks and a set of handcrafted features-of-interest, we propose the methodology and carry out a qualitative and quantitative analysis of the information encoded by the individual BERT's heads. Our findings suggest that there is a limited set of attention patterns that are repeated across different heads, indicating the overall model overparametriza-tion. While different heads consistently use the same attention patterns, they have varying impact on performance across different tasks. We show that manually disabling attention in certain heads leads to a performance improvement over the regular fine-tuned BERT models.
机译:目前,基于BERT的体系结构可在许多NLP任务上提供最先进的性能,但对于促成其成功的确切机制知之甚少。在当前的工作中,我们专注于自我注意力的解释,这是BERT的基本基础组成部分之一。我们使用GLUE任务的子集和一组手工制作的感兴趣功能,提出了该方法,并对由各个BERT头编码的信息进行了定性和定量分析。我们的发现表明,在不同的头脑中重复出现的注意力模式非常有限,这表明整个模型的参数设置过高。尽管不同的负责人始终使用相同的注意力模式,但它们对不同任务的绩效产生不同的影响。我们表明,与常规的微调BERT模型相比,手动禁用某些头部的注意力可以提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号