【24h】

Noise Stability Regularization for Improving BERT Fine-tuning

机译:改善BERT微调的噪声稳定正规化

获取原文

摘要

Fine-tuning pre-trained language models such as BERT has become a common practice dominating leaderboards across various NLP tasks. Despite its recent success and wide adoption, this process is unstable when there are only a small number of training samples available. The brittleness of this process is often reflected by the sensitivity to random seeds. In this paper, we propose to tackle this problem based on the noise stability property of deep nets, which is investigated in recent literature (Arora et al., 2018; Sanyal et al., 2020). Specifically, we introduce a novel and effective regularization method to improve fine-tuning on NLP tasks, referred to as Layer-wise Noise Stability Regularization (LNSR). We extend the theories about adding noise to the input and prove that our method gives a stabler regularization effect. We provide supportive evidence by experimentally confirming that well-performing models show a low sensitivity to noise and fine-tuning with LNSR exhibits clearly higher generalizability and stability. Furthermore, our method also demonstrates advantages over other state-of-the-art algorithms including L~2-SP (Li et al., 2018), Mixout (Lee et al., 2020) and SMART (Jiang et al., 2020).
机译:微调培训的预训练语言模型,如BERT已成为跨各种NLP任务的领导板的常见实践。尽管最近的成功和广泛的采用,但是当只有少量训练样本提供时,这种过程是不稳定的。该过程的脆性通常被对随机种子的敏感性反射。在本文中,我们提出基于深网络的噪声稳定性来解决这一问题,最近的文献(Arora等,2018; Sanyal等,2020)。具体地,我们介绍了一种新颖且有效的正则化方法,以改善对NLP任务的微调,称为层明智的噪声稳定正则化(LNSR)。我们扩展了对输入添加噪声的理论,并证明我们的方法提供了稳定的正则化效果。我们通过通过实验证实,良好的模型显示对噪声的良好敏感性和利用LNSR的微调呈现出明显较高的普遍性和稳定性,提供了良好的模型。此外,我们的方法还展示了包括L〜2-SP(Li等人,2018),混合(Lee等,2020)和Smart(Jiang等,2020)的其他最先进的算法)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号