Identifying whether the uncharacterized protein belongs to a virulent protein or not is important. If it is virulent protein, it is very useful for studying its virulence mechanisms in pathogens as well as designing antiviral drugs. Particularly, with a large number of virulent protein sequences discovered in recent years, it is urgent to develop an automated method to predict the bacterial virulent proteins. In this work, a sequence encoding scheme based on combing DC (Dipeptide Composition) and PseAA (Pseudo Amino Acid) is introduced to represent protein samples. However, this sequence encoding scheme would correspond to a very high dimensional feature vector. A DR (Dimensionality Reduction) algorithm, the so-called MVP (Maximum variance projection) is introduced to extract the key features from the high-dimensional space and reduce the original high-dimensional vector to a lowerdimensional one. Finally, our jackknife test results thus obtained are quite encouraging, which indicate that the above method is used effectively to deal with this complicated problem of predicting virulent proteins in bacterial pathogens.
展开▼