首页> 外文期刊>BMC Bioinformatics >Broad-coverage biomedical relation extraction with SemRep
【24h】

Broad-coverage biomedical relation extraction with SemRep

机译:宽覆盖生物医学关系用SEMREP提取

获取原文
       

摘要

In the era of information overload, natural language processing (NLP) techniques are increasingly needed to support advanced biomedical information management and discovery applications. In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge. We also evaluate SemRep on two datasets. In one evaluation, we use a manually annotated test collection and perform a comprehensive error analysis. In another evaluation, we assess SemRep’s performance on the CDR dataset, a standard benchmark corpus annotated with causal chemical-disease relationships. A strict evaluation of SemRep on our manually annotated dataset yields 0.55 precision, 0.34 recall, and 0.42 F 1 score. A relaxed evaluation, which more accurately characterizes SemRep performance, yields 0.69 precision, 0.42 recall, and 0.52 F 1 score. An error analysis reveals named entity recognition/normalization as the largest source of errors (26.9%), followed by argument identification (14%) and trigger detection errors (12.5%). The evaluation on the CDR corpus yields 0.90 precision, 0.24 recall, and 0.38 F 1 score. The recall and the F 1 score increase to 0.35 and 0.50, respectively, when the evaluation on this corpus is limited to sentence-bound relationships, which represents a fairer evaluation, as SemRep operates at the sentence level. SemRep is a broad-coverage, interpretable, strong baseline system for extracting semantic relations from biomedical text. It also underpins SemMedDB, a literature-scale knowledge graph based on semantic relations. Through SemMedDB, SemRep has had significant impact in the scientific community, supporting a variety of clinical and translational applications, including clinical decision making, medical diagnosis, drug repurposing, literature-based discovery and hypothesis generation, and contributing to improved health outcomes. In ongoing development, we are redesigning SemRep to increase its modularity and flexibility, and addressing weaknesses identified in the error analysis.
机译:在信息过载的时代,越来越需要自然语言处理(NLP)技术来支持先进的生物医学信息管理和发现应用程序。在本文中,我们对SEMREP的深入描述,一个NLP系统,使用语言原则和UMLS域知识从PubMed摘要中提取语义关系。我们还在两个数据集中评估semrep。在一次评估中,我们使用手动注释的测试收集并执行全面的错误分析。在另一个评估中,我们评估了Semrep在CDR DataSet上的表现,标准基准语料库用因果化学疾病关系注释。对我们手动注释的数据集的SEMREP对SEMREP的严格评估产生0.55精度,0.34召回和0.42 F 1分数。一种轻松的评估,更准确地表征SEMREP性能,产生0.69精度,0.42召回和0.52 F 1分数。错误分析显示指定实体识别/归一化作为最大错误源(26.9%),然后是参数识别(14%)并触发检测错误(12.5%)。 CDR语料库的评估产生0.90精度,0.24召回,0.38°F 1分数。当该语料库的评估限于代表更公平的评估时,召回和F 1分别增加到0.35和0.50分别增加到0.35和0.50,因为SEMREP在句子级运行。 SEMREP是一种广泛,可解释的强大基线系统,用于从生物医学文本中提取语义关系。它还基于语义关系的文学规模知识图来构建SemmeddB。通过SemmeddB,Semrep对科学界产生了重大影响,支持各种临床和翻译应用,包括临床决策,医学诊断,药物修复,文学的发现和假设产生,以及改善健康结果。在正在进行的发展中,我们正在重新设计SEMREP,以提高其模块化和灵活性,并解决误差分析中所识别的弱点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号