The development of liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has made it possible to measure phosphopeptides on an increasingly large-scale and high-throughput fashion. However, extracting confident phosphopeptide identifications from the resulting large dataset in a similar high-throughput fashion remains difficult, as does rigorously estimating the false discovery rate (FDR) of a set of phosphopeptide identifications. This article describes a data analysis pipeline designed to address these issues. The first step is to re-analyze phosphopeptide identifications that contain ambiguous assignments for the incorporated phosphate(s) to determine the most likely arrangement of the phosphate(s). The next step is to employ an expectation maximization algorithm to estimate the joint distribution of the SEQUEST scores. A linear discriminant analysis is then performed to determine how to optimally combine peptide scores (in this case, SEQUEST) into a discriminant score that possesses the maximum discriminating power. Based on this discriminant score, the p- and q-values for each phosphopeptide identification are calculated, and the phosphopeptide identification FDR is then estimated. This data analysis approach was applied to data from a study of irradiated human skin fibroblasts to provide a robust estimate of FDR for phosphopeptides, and has been coded into a software package that is freely available ().
展开▼