首页> 外文会议>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus
【24h】

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

机译:UMLSbert:使用统一的医疗语言系统Metathesaurus的临床域名知识增强语境嵌入

获取原文

摘要

Contextual word embedding models, such as BioBERT and Bio_ClinicaiBERT, have achieved state-of-the-art results in biomedical natural language processing tasks by focusing their pre-training process on domain-specific corpora. However, such models do not take into consideration structured expert domain knowledge from a knowledge base. We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process via a novel knowledge augmentation strategy. More specifically, the augmentation on UmlsBERT with the Unified Medical Language System (UMLS) Metathesaurus is performed in two ways: (ⅰ) connecting words that have the same underlying 'concept' in UMLS and (ⅱ) leveraging semantic type knowledge in UMLS to create clinically meaningful input embeddings. By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models on common named-entity recognition (NER) and clinical natural language inference tasks.
机译:上下文中的嵌入模型(例如Biobert和Bio_ClinicaIbert)通过将其在特定于域的语料库上专注于他们的预培训过程来实现生物医学自然语言处理任务的最新导致。但是,这些模型不考虑来自知识库的结构化专家领域知识。我们介绍了UMLSbert,一个上下文嵌入模型,通过新颖的知识增强策略在预培训过程中整合域知识。更具体地说,使用统一的医疗语言系统(UMLS)Metathesaurus的UmlSbert上的增强是以两种方式执行的:(Ⅰ)在UMLS中具有相同的底层“概念”的单词和(Ⅱ)利用UML中的语义类型知识来创建临床有意义的输入嵌入。通过应用这两种策略,UMLSbert可以将临床域知识编码为Word Embeddings并优于公共命名实体识别(NER)和临床自然语言推理任务的现有域特定模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号