首页> 美国卫生研究院文献>Molecular Cellular Proteomics : MCP >Published and Perished? The Influence of the Searched Protein Database on the Long-Term Storage of Proteomics Data
【2h】

Published and Perished? The Influence of the Searched Protein Database on the Long-Term Storage of Proteomics Data

机译:发布并灭亡?搜索的蛋白质数据库对蛋白质组学数据长期存储的影响

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In proteomics, protein identifications are reported and stored using an unstable reference system: protein identifiers. These proprietary identifiers are created individually by every protein database and can change or may even be deleted over time.To estimate the effect of the searched protein sequence database on the long-term storage of proteomics data we analyzed the changes of reported protein identifiers from all public experiments in the Proteomics Identifications (PRIDE) database by November 2010. To map the submitted protein identifier to a currently active entry, two distinct approaches were used. The first approach used the Protein Identifier Cross Referencing (PICR) service at the EBI, which maps protein identifiers based on 100% sequence identity. The second one (called logical mapping algorithm) accessed the source databases and retrieved the current status of the reported identifier.Our analysis showed the differences between the main protein databases (International Protein Index (IPI), UniProt Knowledgebase (UniProtKB), National Center for Biotechnological Information nr database (NCBI nr), and Ensembl) in respect to identifier stability. For example, whereas 20% of submitted IPI entries were deleted after two years, virtually all UniProtKB entries remained either active or replaced. Furthermore, the two mapping algorithms produced markedly different results. For example, the PICR service reported 10% more IPI entries deleted compared with the logical mapping algorithm. We found several cases where experiments contained more than 10% deleted identifiers already at the time of publication. We also assessed the proportion of peptide identifications in these data sets that still fitted the originally identified protein sequences. Finally, we performed the same overall analysis on all records from IPI, Ensembl, and UniProtKB: two releases per year were used, from 2005. This analysis showed for the first time the true effect of changing protein identifiers on proteomics data. Based on these findings, UniProtKB seems the best database for applications that rely on the long-term storage of proteomics data.
机译:在蛋白质组学中,使用不稳定的参考系统(蛋白质标识符)报告和存储蛋白质鉴定。这些专有标识符由每个蛋白质数据库单独创建,并且可以随时间变化甚至删除。为评估所搜索蛋白质序列数据库对蛋白质组学数据长期存储的影响,我们分析了所有蛋白质报告的蛋白质标识符的变化到2010年11月,在蛋白质组学鉴定(PRIDE)数据库中进行了公开实验。为了将提交的蛋白质标识符映射到当前活跃的条目,使用了两种不同的方法。第一种方法是使用EBI的蛋白质标识符交叉引用(PICR)服务,该服务基于100%序列同一性来映射蛋白质标识符。第二种方法(称为逻辑映射算法)访问源数据库并检索报告的标识符的当前状态。我们的分析显示了主要蛋白质数据库之间的差异(国际蛋白质索引(IPI),UniProt知识库(UniProtKB),美国国家中心有关标识符稳定性的生物技术信息数据库(NCBI nr)和Ensembl)。例如,两年后删除了20%的已提交IPI条目,而实际上所有UniProtKB条目仍处于活动状态或已被替换。此外,这两种映射算法产生了明显不同的结果。例如,与逻辑映射算法相比,PICR服务报告删除的IPI条目多10%。我们发现了几种情况,其中的实验在发布时已包含超过10%的已删除标识符。我们还在这些数据集中评估了肽段鉴定的比例,这些数据仍然适合最初鉴定的蛋白质序列。最后,我们对IPI,Ensembl和UniProtKB的所有记录进行了相同的整体分析:从2005年开始,每年使用两个版本。该分析首次显示了改变蛋白质标识符对蛋白质组学数据的真正影响。基于这些发现,UniProtKB似乎是依赖蛋白质组学数据长期存储的应用程序的最佳数据库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号