首页> 外文期刊>BMC Bioinformatics >Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes
【24h】

Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes

机译:知识增强的生物医学命名实体识别和归一化:施用蛋白质和基因

获取原文

摘要

Automated biomedical named entity recognition and normalization serves as the basis for many downstream applications in information management. However, this task is challenging due to name variations and entity ambiguity. A biomedical entity may have multiple variants and a variant could denote several different entity identifiers. To remedy the above issues, we present a novel knowledge-enhanced system for protein/gene named entity recognition (PNER) and normalization (PNEN). On one hand, a large amount of entity name knowledge extracted from biomedical knowledge bases is used to recognize more entity variants. On the other hand, structural knowledge of entities is extracted and encoded as identifier (ID) embeddings, which are then used for better entity normalization. Moreover, deep contextualized word representations generated by pre-trained language models are also incorporated into our knowledge-enhanced system for modeling multi-sense information of entities. Experimental results on the BioCreative VI Bio-ID corpus show that our proposed knowledge-enhanced system achieves 0.871?F1-score for PNER and 0.445?F1-score for PNEN, respectively, leading to a new state-of-the-art performance. We propose a knowledge-enhanced system that combines both entity knowledge and deep contextualized word representations. Comparison results show that entity knowledge is beneficial to the PNER and PNEN task and can be well combined with contextualized information in our system for further improvement.
机译:自动生物医学命名实体识别和归一化是信息管理中许多下游应用程序的基础。但是,由于名称变化和实体歧义,此任务是具有挑战性的。生物医学实体可以具有多个变体,并且变型可以表示几种不同的实体标识符。为了解决上述问题,我们提出了一种用于命名实体识别(PNER)和标准化(PNEN)的蛋白质/基因的新颖知识增强系统。一方面,从生物医学知识库中提取的大量实体名称知识用于识别更多实体变体。另一方面,实体的结构知识被提取并编码为标识符(ID)嵌入式,然后用于更好的实体归一化。此外,通过预先接受预先接受的语言模型生成的深层语境化词也被纳入我们的知识增强系统,用于建模实体的多感法信息。 BioCreative VI Bio-ID语料库的实验结果表明,我们提出的知识增强系统实现了PNER的0.871分,分别为0.445分,分别为PNEN的速度,导致新的最先进的性能。我们提出了一个知识增强的系统,它结合了实体知识和深刻的上下文化词表示。比较结果表明,实体知识有利于代理合作社和PNEN任务,可以很好地与我们系统中的上下文信息相结合,以进一步改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号