首页> 美国卫生研究院文献>Molecular Cellular Proteomics : MCP >A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets
【2h】

A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets

机译:蛋白质组数据集中蛋白质错误发现率估计的可扩展方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target–decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target–decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ∼19,000 LC-MS/MS runs deposited in ProteomicsDB (). The “picked” protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The “picked” target–decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used “classic” protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software.
机译:在分析非常庞大的蛋白质组数据集(例如整个人类蛋白质组)时,计算可靠地识别的蛋白质数量并估计错误发现率(FDR)是一项挑战。蛋白质组学实验中的生物学和技术异质性进一步加剧了挑战,关于蛋白质FDR的概念有效性存在强烈的意见分歧,并且对于蛋白质FDR测定的方法尚无共识。广泛使用的经典目标诱饵策略还存在固有的局限性,特别是在分析非常大的数据集时会显示出这些局限性,从而导致诱饵识别的过度表现。在这项研究中,我们利用ProteomicsDB中存储的约19,000个LC-MS / MS组成的异构数据收集,研究了经典方法以及基于靶标诱饵的新型蛋白质FDR估算方法的优点。 “挑选”的蛋白质FDR方法将同一蛋白质的靶序列和诱饵序列作为一对而不是单个实体对待,并根据获得最高分的目标选择靶序列或诱饵序列。我们结合基于q值的肽评分研究了该方法的性能,以归一化样本,仪器和搜索引擎特定的差异。当蛋白质评分是基于每种蛋白质的最佳肽q值进行的,从而在各种q值阈值范围内产生稳定数目的真实阳性蛋白质时,“选择”的目标诱饵策略效果最佳。我们表明,这种简单且无偏见的策略消除了通常使用的“经典”蛋白质FDR方法中的一个概念性问题,该问题会导致对大数据集中的假阳性蛋白质鉴定产生过度预测。该方法可在不损失性能的情况下从小数据集扩展到非常大的数据集,不断增加真阳性蛋白质鉴定的数量,并易于在蛋白质组学分析软件中实施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号