【24h】

Attention-based Bidirectional Long Short-Term Memory Networks for Relation Classification Using Knowledge Distillation from BERT

机译:基于注意力的双向双向长期短期记忆网络用于关系分类

获取原文

摘要

Relation classification is an important task in the field of natural language processing. Today the best-performing models often use huge, transformer-based neural architectures like BERT and XLNet and have hundreds of millions of network parameters. These large neural networks have led to the belief that the shallow neural networks of the previous generation for relation classification are obsolete. However, due to large network size and low inference speed, these models may be impractical in on-line real-time systems or resource-restricted systems. To address this issue, we try to accelerate these well-performing language models by compressing them. Specifically, we distill knowledge for relation classification from a huge, transformer-based language model, BERT, into an Attention-Based Bidirectional Long Short-Term Memory Network. We run our model on the SemEval-2010 relation classification task. According to the experiment results, the performance of our model exceeds that of other LSTM-based methods and almost catches up that of BERT. For model inference time, our model has 157 times fewer network parameters, and as a result, it uses about 229 times less inference time than BERT.
机译:关系分类是自然语言处理领域中的重要任务。如今,性能最佳的模型通常使用庞大的基于变压器的神经架构,例如BERT和XLNet,并具有数亿个网络参数。这些大型神经网络导致人们相信,上一代用于关系分类的浅层神经网络已过时。但是,由于网络规模大且推理速度低,因此这些模型在在线实时系统或资源受限的系统中可能不切实际。为了解决此问题,我们尝试通过压缩来加速这些性能良好的语言模型。具体来说,我们将基于关系的分类的知识从庞大的基于变压器的语言模型BERT提取到基于注意力的双向长期短期记忆网络中。我们在SemEval-2010关系分类任务上运行我们的模型。根据实验结果,我们模型的性能超过了其他基于LSTM的方法,几乎​​赶上了BERT的性能。对于模型推理时间,我们的模型具有比网络参数少157倍的网络参数,因此,它使用的推理时间比BERT少约229倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号