首页> 外文会议>Conference on Empirical Methods in Natural Language Processing >BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
【24h】

BERT-of-Theseus: Compressing BERT by Progressive Module Replacing

机译:伯爵of-ofticus:通过渐进模块更换压缩伯特

获取原文

摘要

In this paper, we propose a novel model compression approach to effectively compress BERT by progressive module replacing. Our approach first divides the original BERT into several modules and builds their compact substitutes. Then, we randomly replace the original modules with their substitutes to train the compact modules to mimic the behavior of the original modules. We progressively increase the probability of replacement through the training. In this way, our approach brings a deeper level of interaction between the original and compact models. Compared to the previous knowledge distillation approaches for BERT compression, our approach does not introduce any additional loss function. Our approach outperforms existing knowledge distillation approaches on GLUE benchmark, showing a new perspective of model compression.
机译:在本文中,我们提出了一种新颖的模型压缩方法,通过逐步模块更换有效地压缩伯特。我们的方法首先将原始焊接分为多个模块,并构建其紧凑的替代品。然后,我们随机将原始模块用替换机替换为培训紧凑模块以模仿原始模块的行为。我们通过培训逐步增加更换的概率。通过这种方式,我们的方法带来了原始和紧凑型号之间的更深层次的相互作用。与以前的知识蒸馏方法相比,我们的方法不会引入任何额外的损失功能。我们的方法优于胶水基准现有的知识蒸馏方法,显示了模型压缩的新视角。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号