首页> 外文期刊>mSystems >PaperBLAST: Text Mining Papers for Information about Homologs
【24h】

PaperBLAST: Text Mining Papers for Information about Homologs

机译:paperblast:有关同源物的信息,文本挖掘论文

获取原文
获取外文期刊封面目录资料

摘要

Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/ . IMPORTANCE With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins’ functions.
机译:大规模的基因组测序鉴定了数百万蛋白编码基因,其功能未知。这些蛋白质中的许多类似于来自其他生物的特征蛋白质,但从注释数据库中缺少了大部分信息,并隐藏在科学文献中。要使此信息可访问,请使用EuropPEPMC来搜索科学文章的全文中,以便参考基因。 PaperBlast还利用了将蛋白质序列链接到科学文章的愈合资源(Swiss-Protif,Generif和Ecocyc)。 Paperblast的数据库包括超过70,000种科学文章,提及超过400,000种不同的蛋白质。鉴于感兴趣的蛋白质,Pooreblast迅速发现在文献中讨论的类似蛋白质,并从相关文章或策展人中呈现文本片段。 PaperBlast可在http://papers.genomics.lbl.gov/提供。重要性与最近的基因组测序数据的爆炸,现在有数百万个无特征蛋白质。如果科学家对其中一个蛋白质感兴趣,那么可以很难找到其可能的功能。通常已经研究了序列的序列和可能具有类似功能的蛋白质,但这些信息在任何数据库中都不可用。为了帮助查找有关类似蛋白质的文章,PaperBlast搜索用于蛋白质标识符或基因标识符的科学制品的全文,并将这些制品与蛋白质序列联系起来。然后,鉴于感兴趣的蛋白质,它可以通过使用标准软件(BLAST)在其数据库中快速找到类似的蛋白质,并且它可以显示相关论文的文本片段。我们希望“餐饮者”将使生物学家更容易预测蛋白质的功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号