【24h】

SciBert: A Pretrained Language Model for Scientific Text

机译:SciBert:科学文本的预训练语言模型

获取原文

摘要

Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on Bert (Devlin et al., 2019) to address the lack of high-quality, large-scale labeled scientific data. SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks. We evaluate on a suite of tasks including sequence tagging, sentence classification and dependency parsing, with datasets from a variety of scientific domains. We demonstrate statistically significant improvements over BERT and achieve new state-of-the-art results on several of these tasks. The code and pretrained models are available at https://github.com/allenai/scibert/.
机译:在科学领域中为NLP任务获取大规模的注释数据既具有挑战性又昂贵。我们发布了SciBERT,这是一种基于Bert的预训练语言模型(Devlin等,2019),以解决缺乏高质量,大规模标签化科学数据的问题。 SciBERT在大型的多领域科学出版物上利用无监督的预培训来提高下游科学NLP任务的性能。我们对一系列任务进行评估,包括序列标记,句子分类和依存关系解析以及来自各种科学领域的数据集。我们证明了对BERT的统计上的重大改进,并在其中一些任务上取得了最新的最新结果。代码和预训练模型可在https://github.com/allenai/scibert/获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号