首页> 外文期刊>BMC Bioinformatics >PepBank - a database of peptides based on sequence text mining and public peptide data sources
【24h】

PepBank - a database of peptides based on sequence text mining and public peptide data sources

机译:PepBank-基于序列文本挖掘和公共肽数据源的肽数据库

获取原文
           

摘要

Background Peptides are important molecules with diverse biological functions and biomedical uses. To date, there does not exist a single, searchable archive for peptide sequences or associated biological data. Rather, peptide sequences still have to be mined from s and full-length articles, and/or obtained from the fragmented public sources. Description We have constructed a new database (PepBank), which at the time of writing contains a total of 19,792 individual peptide entries. The database has a web-based user interface with a simple, Google-like search function, advanced text search, and BLAST and Smith-Waterman search capabilities. The major source of peptide sequence data comes from text mining of MEDLINE s. Another component of the database is the peptide sequence data from public sources (ASPD and UniProt). An additional, smaller part of the database is manually curated from sets of full text articles and text mining results. We show the utility of the database in different examples of affinity ligand discovery. Conclusion We have created and maintain a database of peptide sequences. The database has biological and medical applications, for example, to predict the binding partners of biologically interesting peptides, to develop peptide based therapeutic or diagnostic agents, or to predict molecular targets or binding specificities of peptides resulting from phage display selection. The database is freely available on http://pepbank.mgh.harvard.edu/ , and the text mining source code (Peptide::Pubmed) is freely available above as well as on CPAN ( http://www.cpan.org/ ).
机译:背景技术肽是具有多种生物学功能和生物医学用途的重要分子。迄今为止,还没有一个可搜索的肽序列或相关生物学数据档案。而是,肽序列仍必须从s和全长文章中挖掘,和/或从零散的公共资源获得。描述我们建立了一个新的数据库(PepBank),在编写本文时,该数据库总共包含19,792个单独的肽段条目。该数据库具有基于Web的用户界面,具有简单的类似于Google的搜索功能,高级文本搜索以及BLAST和Smith-Waterman搜索功能。肽序列数据的主要来源来自MEDLINE的文本挖掘。该数据库的另一个组件是来自公共来源(ASPD和UniProt)的肽序列数据。数据库的一小部分是从全文文章和文本挖掘结果的集合中手动选择的。我们在亲和配体发现的不同示例中显示了数据库的实用性。结论我们已经建立并维护了一个肽序列数据库。该数据库具有生物学和医学应用,例如,预测生物学上感兴趣的肽的结合伴侣,开发基于肽的治疗剂或诊断剂,或预测由于噬菌体展示选择而产生的肽的分子靶标或结合特异性。该数据库可在http://pepbank.mgh.harvard.edu/上免费获得,文本挖掘源代码(Peptide :: Pubmed)在上面以及CPAN(http://www.cpan.org)上均可免费获得。 /)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号