Analyzing and engineering cellular signaling processes requires accurate estimation of cellular subprocesses such as protein-folding. We apply parametric and nonparametric classification to the problem of assessing three-dimensional protein domain structure predictions generated by the Rosetta ab initio structure prediction method. The assessment is based on whether the predicted structure is similar enough to a known protein structure to be classified as being in the same protein superfamily. We develop appropriate features and apply Gaussian mixture models, K-nearest-neighbors, and the recently developed linear interpolation with maximum entropy method (LIME). The proposed learning methods outperform a previous quality assessment method based on generalized linear models. Results show that the proposed methods reject the vast majority of poor structural predictions while identifying a useful number of good predictions.
展开▼