【24h】

EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

机译:早起:通过早鸟彩票票价培训的高效培训

获取原文

摘要

Heavily overparameterized language models such as BERT, XLNet and T5 have achieved impressive success in many NLP tasks. However, their high model complexity requires enormous computation resources and extremely long training time for both pre-training and tine-tuning. Many works have studied model compression on large NLP models, but only focusing on reducing inference time while still requiring an expensive training process. Other works use extremely large batch sizes to shorten the pre-training time, at the expense of higher computational resource demands. In this paper, inspired by the Early-Bird Lottery Tickets recently studied for computer vision tasks, we propose EarlyBERT, a general computationally-efficient training algorithm applicable to both pre-training and tine-tuning of large-scale language models. By slimming the self-attention and fully-connected sub-layers inside a transformer, we are the first to identify structured winning tickets in the early stage of BERT training. We apply those tickets towards efficient BERT training, and conduct comprehensive pre-training and fine-tuning experiments on GLUE and SQuAD downstream tasks. Our results show that EarlyBERT achieves comparable performance to standard BERT, with 35~45% less training time.
机译:诸如BERT,XLNET和T5等诸如BERT,XLNET和T5之类的庞大的分参考模型在许多NLP任务中取得了令人印象深刻的成功。然而,它们的高模型复杂性需要巨大的计算资源和极长的训练时间,适用于预训练和调整。许多作品已经研究了大型NLP模型的模型压缩,但仅关注减少推理时间,同时仍然需要昂贵的训练过程。其他作品使用极大的批量尺寸来缩短预训练时间,以牺牲更高的计算资源需求。在本文中,由最近研究了计算机愿景任务的早期彩票机票,我们提出了一种适用于预训练和调整大型语言模型的一般计算有效的培训算法。通过减肥变压器内的自我关注和完全连接的子层,我们是第一个在BERT培训的早期识别结构化获奖票。我们将这些门票应用于高效的BERT培训,并对胶水和小队下游任务进行全面的预训练和微调实验。我们的研究结果表明,早期均为标准伯特达到了可比性,培训时间较少35〜45%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号