The accuracy of a homology model based on the structure of a distant relative or other topologically equivalent protein is primarily limited by the quality of the alignment. Here we describe a systematic approach for sequence-to-structure alignment, called ‘K*Sync’, in which alignments are generated by dynamic programming using a scoring function that combines information on many protein features, including a novel measure of how obligate a sequence region is to the protein fold. By systematically varying the weights on the different features that contribute to the alignment score, we generate very large ensembles of diverse alignments, each optimal under a particular constellation of weights. We investigate a variety of approaches to select the best models from the ensemble, including consensus of the alignments, a hydrophobic burial measure, low- and high-resolution energy functions, and combinations of these evaluation methods. The effect on model quality and selection resulting from loop modeling and backbone optimization is also studied. The performance of the method on a benchmark set is reported and shows the approach to be effective at both generating and selecting accurate alignments. The method serves as the foundation of the homology modeling module in the Robetta server.
展开▼
机译:基于远亲或其他拓扑等效蛋白质的结构的同源性模型的准确性主要受比对质量的限制。在这里,我们描述了一种用于序列至结构比对的系统方法,称为“ K * Sync”,其中通过使用结合了许多蛋白质特征信息的评分功能通过动态编程生成比对,包括对序列多专一性的新颖测量区域是蛋白质折叠的区域。通过系统地改变影响对齐分数的不同特征上的权重,我们生成了非常大的各种对齐方式的集合,每个集合在特定的权重星座下都是最优的。我们研究了多种从集合中选择最佳模型的方法,包括比对的共识,疏水掩埋法,低分辨率和高分辨率能量函数以及这些评估方法的组合。还研究了环路建模和主干优化对模型质量和选择的影响。报告了该方法在基准集上的性能,并表明该方法在生成和选择精确比对时均有效。该方法是Robetta服务器中同源性建模模块的基础。
展开▼