SciBert: A Pretrained Language Model for Scientific Text

机译：SciBert：科学文本的预训练语言模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on Bert (Devlin et al., 2019) to address the lack of high-quality, large-scale labeled scientific data. SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks. We evaluate on a suite of tasks including sequence tagging, sentence classification and dependency parsing, with datasets from a variety of scientific domains. We demonstrate statistically significant improvements over BERT and achieve new state-of-the-art results on several of these tasks. The code and pretrained models are available at https://github.com/allenai/scibert/.

机译：在科学领域中为NLP任务获取大规模的注释数据既具有挑战性又昂贵。我们发布了SciBERT，这是一种基于Bert的预训练语言模型（Devlin等，2019），以解决缺乏高质量，大规模标签化科学数据的问题。 SciBERT在大型的多领域科学出版物上利用无监督的预培训来提高下游科学NLP任务的性能。我们对一系列任务进行评估，包括序列标记，句子分类和依存关系解析以及来自各种科学领域的数据集。我们证明了对BERT的统计上的重大改进，并在其中一些任务上取得了最新的最新结果。代码和预训练模型可在https://github.com/allenai/scibert/获得。

著录项

来源
《International joint conference on natural language processing;Conference on empirical methods in natural language processing》|2019年|3613-3618|共6页
会议地点
作者
Iz Beltagy; Kyle Lo; Arman Cohan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. The Impact of Pretrained Language Models on Negation and Speculation Detection in Cross-Lingual Medical Text: Comparative Study [J] . Renzo Rivera Zavala, Paloma Martinez JMIR Medical Informatics . 2020,第12期

机译：普瑞赖尔语言模型对跨语言医学文本否定和猜测检测的影响：比较研究
2. Toward a clinical text encoder: pretraining for clinical natural language processing with applications to substance misuse [J] . Dmitriy Dligach, Majid Afshar, Timothy Miller Journal of the American Medical Informatics Association : . 2019,第11期

机译：探讨临床文本编码器：预先介绍用于临床自然语言处理的应用程序滥用
3. A mobile system of word-for-word translation based on formal specifications for rendering of scientific texts in inflectional languages [J] . N.M.Mishchenko Cybernetics and Systems Analysis . 1999,第1期

机译：一种基于正式规范的词对词翻译移动系统，用于用屈折语言呈现科学文本
4. SciBert: A Pretrained Language Model for Scientific Text [C] . Iz Beltagy, Kyle Lo, Arman Cohan International joint conference on natural language processing . 2019

机译：Scibert：科学文本的预磨语言模型
5. Models, Languages, and Algorithms for Scientific Workflow Monitoring and Exception Handling [D] . Ruan, Dong. 2018

机译：用于科学工作流监控和异常处理的模型，语言和算法
6. Toward a clinical text encoder: pretraining for clinical natural language processing with applications to substance misuse [O] . Dmitriy Dligach, Majid Afshar, Timothy Miller -1

机译：面向临床文本编码器：对临床自然语言处理的预培训及其在滥用药物方面的应用
7. SciBERT: A Pretrained Language Model for Scientific Text [O] . Iz Beltagy, Kyle Lo, Arman Cohan 2019

机译：Scibert：科学文本的预磨语言模型

SciBert: A Pretrained Language Model for Scientific Text

摘要

著录项

相似文献

相关主题

期刊订阅