The right-handed single-stranded helix proteins characterized as virulence factors, allergens and toxins are threat to human health. Identification of these proteins from primary sequence is of great importance in bio-medicine and medical microbiology. In this paper, support vector machine (SVM) has been used to predict the presence of ß-helix fold in protein sequences using dipeptide composition. Input vector of 400 dimensions is used to search for the presence of conserved secondary structure called rungs in ß-helix proteins. A maximum accuracy of 90.1% and Matthew''s correlation coefficient of 0.77 is obtained in a 5-fold cross-validation procedure. In addition, a position specific scoring matrix(PSSM) is also used to score putative rung sequences identified by SVM. Finally, the predicted ß-helix proteins are threaded against a custom ß-helix template library to achieve high prediction confidence. The method recognizes right-handed ß-helices with 100% sensitivity and 99.8% specificity on a test set of known protein structures.
展开▼