首页> 外文期刊>BMC Bioinformatics >Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens
【24h】

Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens

机译:通过自然语言处理进行PubMed摘要的文本挖掘,以在细菌肠臭的分子机制上创建公共知识基础

获取原文
           

摘要

Background The Enteropathogen Resource Integration Center (ERIC; http://www.ericbrc.org ) has a goal of providing bioinformatics support for the scientific community researching enteropathogenic bacteria such as Escherichia coli and Salmonella spp. Rapid and accurate identification of experimental conclusions from the scientific literature is critical to support research in this field. Natural Language Processing (NLP), and in particular Information Extraction (IE) technology, can be a significant aid to this process. Description We have trained a powerful, state-of-the-art IE technology on a corpus of s from the microbial literature in PubMed to automatically identify and categorize biologically relevant entities and predicative relations. These relations include: Genes/Gene Products and their Roles; Gene Mutations and the resulting Phenotypes; and Organisms and their associated Pathogenicity. Evaluations on blind datasets show an F-measure average of greater than 90% for entities (genes, operons, etc.) and over 70% for relations (gene/gene product to role, etc). This IE capability, combined with text indexing and relational database technologies, constitute the core of our recently deployed text mining application. Conclusion Our Text Mining application is available online on the ERIC website http://www.ericbrc.org/portal/eric/articles . The information retrieval interface displays a list of recently published enteropathogen literature s, and also provides a search interface to execute custom queries by keyword, date range, etc. Upon selection, processed s and the entities and relations extracted from them are retrieved from a relational database and marked up to highlight the entities and relations. The also provides links from extracted genes and gene products to the ERIC Annotations database, thus providing access to comprehensive genomic annotations and adding value to both the text-mining and annotations systems.
机译:背景技术肠球病资源集成中心(Eric; http://www.erckb.org)的目标是为研究肠致病细菌等科学疾病等科学群体和沙门氏菌SPP提供生物信息学支持。快速准确地识别科学文学的实验结论对于支持该领域的研究至关重要。自然语言处理(NLP),特别是信息提取(IE)技术,对此过程可能是一个显着的援助。描述我们在PubMed中的微生物文献中训练了强大的最先进的IE技术,以自动识别和分类生物学相关实体和预测关系。这些关系包括:基因/基因产品及其作用;基因突变和所得表型;和生物及其相关的致病性。对盲目数据集的评估显示实体(基因,操纵子等)的F-Measure平均值大于90%,并且关系超过70%(基因/基因产物到角色等)。这就是功能,结合文本索引和关系数据库技术,构成了我们最近部署的文本挖掘应用程序的核心。结论我们的文本挖掘应用程序在Eric网站http://www.ericbrc.org/portal/articles上提供。信息检索界面显示最近发布的肠球内文献S的列表,并提供了通过关键字,日期范围等执行自定义查询的搜索界面。在选择,从关系中检索从它们提取的处理和从它们提取的实体和关系数据库并标记为突出实体和关系。该还提供从提取的基因和基因产品到Eric注释数据库的链接,从而提供对综合基因组注释的访问并向文本挖掘和注释系统添加价值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号