首页> 美国卫生研究院文献>Database: The Journal of Biological Databases and Curation >eGenPub a text mining system for extending computationally mapped bibliography for UniProt Knowledgebase by capturing centrality
【2h】

eGenPub a text mining system for extending computationally mapped bibliography for UniProt Knowledgebase by capturing centrality

机译:eGenPub一种文本挖掘系统用于通过捕获中心点来扩展UniProt知识库的计算映射书目

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

UniProt Knowledgebase (UniProtKB) is a publicly available database with access to a vast amount of protein sequence and functional information. To widen the scope of the publications associated with a protein entry, UniProt has introduced the computationally mapped additional bibliography section, which includes literature collected from external sources. In this article, we describe a text mining system, eGenPub, which selects articles that are ‘about’ specific proteins and allows automatic identification of additional bibliography for given UniProt protein entries. Focusing on plant proteins initially, eGenPub utilizes a gene normalization tool called pGenN, and a trained support vector machine model, which achieves a precision of 95.3%, to predict whether an article, based on its abstract, should be linked to a given UniProt entry. We have conducted a full-scale PubMed processing using eGenPub for eight common plant species. Altogether, 9025 articles are identified as relevant bibliography for 4752 UniProt entries, among which 5252 are additional papers not in the existing publication section. These newly computationally mapped additional bibliography via eGenPub is being integrated in the UniProt production pipeline, and can be accessed via the UniProtKB protein entry publication view.
机译:UniProt知识库(UniProtKB)是一个公共数据库,可访问大量蛋白质序列和功能信息。为了扩大与蛋白质输入相关的出版物的范围,UniProt引入了计算映射的附加书目部分,其中包括从外部来源收集的文献。在本文中,我们描述了一种文本挖掘系统eGenPub,该系统选择“关于”特定蛋白质的文章,并允许针对给定的UniProt蛋白质条目自动识别其他书目。 eGenPub最初专注于植物蛋白,利用称为pGenN的基因归一化工具和训练有素的支持向量机模型(达到95.3%的精度)来预测是否应根据摘要将文章链接到给定的UniProt条目。我们已经使用eGenPub对8种常见植物进行了全面的PubMed处理。总共将9025篇文章确定为4752 UniProt条目的相关参考书目,其中5252篇是现有出版物部分中没有的其他论文。这些通过eGenPub进行新计算映射的附加书目已集成到UniProt生产管道中,并且可以通过UniProtKB蛋白质条目发布视图进行访问。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号