A method of building a sequence activity model with reference to structural data is disclosed, and the model can be used to guide the directed evolution of proteins with beneficial properties. Some embodiments use genetic algorithms and structural data to filter out data without information value. Some embodiments use a support vector machine to train the sequence activity model. This filtering and training method can generate sequence activity models with higher predictive power than conventional modeling methods. A system and computer program product for implementing the method are also provided.
展开▼