...
首页> 外文期刊>BMC Bioinformatics >Structured learning for spatial information extraction from biomedical text: bacteria biotopes
【24h】

Structured learning for spatial information extraction from biomedical text: bacteria biotopes

机译:从生物医学文本中提取空间信息的结构化学习:细菌生物群落

获取原文
           

摘要

Background We aim to automatically extract species names of bacteria and their locations from webpages. This task is important for exploiting the vast amount of biological knowledge which is expressed in diverse natural language texts and putting this knowledge in databases for easy access by biologists. The task is challenging and the previous results are far below an acceptable level of performance, particularly for extraction of localization relationships. Therefore, we aim to design a new system for such extractions, using the framework of structured machine learning techniques. Results We design a new model for joint extraction of biomedical entities and the localization relationship. Our model is based on a spatial role labeling (SpRL) model designed for spatial understanding of unrestricted text. We extend SpRL to extract discourse level spatial relations in the biomedical domain and apply it on the BioNLP-ST 2013, BB-shared task. We highlight the main differences between general spatial language understanding and spatial information extraction from the scientific text which is the focus of this work. We exploit the text’s structure and discourse level global features. Our model and the designed features substantially improve on the previous systems, achieving an absolute improvement of approximately 57 percent over F1 measure of the best previous system for this task. Conclusions Our experimental results indicate that a joint learning model over all entities and relationships in a document outperforms a model which extracts entities and relationships independently. Our global learning model significantly improves the state-of-the-art results on this task and has a high potential to be adopted in other natural language processing (NLP) tasks in the biomedical domain.
机译:背景技术我们旨在从网页中自动提取细菌的物种名称及其位置。这项任务对于利用以多种自然语言文字表达的大量生物知识并将这些知识存储在数据库中以供生物学家轻松使用非常重要。这项任务具有挑战性,并且以前的结果远远低于可接受的性能水平,尤其是对于提取本地化关系而言。因此,我们旨在使用结构化机器学习技术的框架来设计用于此类提取的新系统。结果我们设计了一种新的联合提取生物医学实体和定位关系的模型。我们的模型基于空间角色标记(SpRL)模型,该模型旨在对不受限制的文本进行空间理解。我们扩展了SpRL,以提取生物医学领域中话语级别的空间关系,并将其应用于BB共享的BioNLP-ST 2013。我们强调了一般空间语言理解和从科学文本中提取空间信息之间的主要区别,这是本文的重点。我们利用文本的结构和话语级别的全局特征。我们的模型和设计的功能在以前的系统上进行了实质性的改进,比用于此任务的最佳先前系统的F1测量值绝对提高了约57%。结论我们的实验结果表明,文档中所有实体和关系的联合学习模型优于独立提取实体和关系的模型。我们的全球学习模型极大地改善了该任务的最新成果,并且在生物医学领域的其他自然语言处理(NLP)任务中具有很高的应用潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号