首页> 外文会议>International Conference on Parsing Technologies >Distilling Neural Networks for Greener and Faster Dependency Parsing
【24h】

Distilling Neural Networks for Greener and Faster Dependency Parsing

机译:蒸馏神经网络以实现更绿色和更快的依赖关系解析

获取原文

摘要

The carbon footprint of natural language processing research has been increasing in recent years due to its reliance on large and inefficient neural network implementations. Distillation is a network compression technique which attempts to impart knowledge from a large model to a smaller one. We use teacher-student distillation to improve the efficiency of the Biaffine dependency parser which obtains state-of-the-art performance with respect to accuracy and parsing speed (Dozat and Manning, 2017). When distilling to 20% of the original model's trainable parameters, we only observe an average decrease of ~1 point for both UAS and LAS across a number of diverse Universal Dependency treebanks while being 2.30x (1.19x) faster than the baseline model on CPU (GPU) at inference time. We also observe a small increase in performance when compressing to 80% for some treebanks. Finally, through distillation we attain a parser which is not only faster but also more accurate than the fastest modern parser on the Penn Treebank.
机译:近年来,自然语言处理研究的碳足迹由于其对大型且效率低下的神经网络实现的依赖而一直在增加。蒸馏是一种网络压缩技术,旨在将知识从大型模型传递到小型模型。我们使用师生蒸馏来提高Biaffine依赖解析器的效率,该解析器在准确性和解析速度方面获得了最先进的性能(Dozat and Manning,2017)。当提取原始模型的可训练参数的20%时,我们仅观察到多个不同的Universal Dependency树库中UAS和LAS的平均下降〜1个点,而比CPU上的基线模型快2.30倍(1.19倍) (GPU)在推断时间。当压缩到某些树库的80%时,我们还观察到性能略有提高。最后,通过蒸馏,我们获得了一个解析器,该解析器不仅比Penn Treebank上最快的现代解析器还要准确。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号