BackgroundMass spectrometry-based protein identification methods are fundamental to proteomics. Biological experiments are usually performed in replicates and proteomic analyses generate huge datasets which need to be integrated and quantitatively analyzed. The Sequest™ search algorithm is a commonly used algorithm for identifying peptides and proteins from two dimensional liquid chromatography electrospray ionization tandem mass spectrometry (2-D LC ESI MS2) data. A number of proteomic pipelines that facilitate high throughput 'post data acquisition analysis' are described in the literature. However, these pipelines need to be updated to accommodate the rapidly evolving data analysis methods. Here, we describe a proteomic data analysis pipeline that specifically addresses two main issues pertinent to protein identification and differential expression analysis: 1) estimation of the probability of peptide and protein identifications and 2) non-parametric statistics for protein differential expression analysis. Our proteomic analysis workflow analyzes replicate datasets from a single experimental paradigm to generate a list of identified proteins with their probabilities and significant changes in protein expression using parametric and non-parametric statistics.
展开▼
机译:背景技术基于质谱的蛋白质鉴定方法是蛋白质组学的基础。生物实验通常重复进行,蛋白质组学分析会产生庞大的数据集,需要进行整合和定量分析。 Sequest™搜索算法是从二维液相色谱电喷雾串联电离质谱(2-D LC ESI MS 2 sup>)数据中识别肽和蛋白质的常用算法。文献中描述了许多促进高通量“数据采集后分析”的蛋白质组学流水线。但是,需要更新这些管道以适应快速发展的数据分析方法。在这里,我们描述了一个蛋白质组学数据分析管道,专门解决了与蛋白质鉴定和差异表达分析有关的两个主要问题:1)肽和蛋白质鉴定的概率估计; 2)蛋白质差异表达分析的非参数统计。我们的蛋白质组学分析工作流程分析来自单个实验范式的复制数据集,以使用参数和非参数统计信息生成已鉴定蛋白质的列表,以及它们的概率以及蛋白质表达的重大变化。
展开▼