Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix >D = [rij2] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix >C, that has elements of either 0 or 1 depending on whether the distance rij is greater or less than a cutoff value rcutoff .We have performed spectral decomposition of the distance matrices , in terms of eigenvalues λk and the corresponding eigenvectors >vk and found that it contains at most 5 nonzero terms. A dominant eigenvector is proportional to r2 - the square distance of points from the center of mass, with the next three being the principal components of the system of points. By knowing r2 we can approximate a distance matrix of a protein with an expected RMSD value of about 4.5Å. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the dynamics. After structure matching, we apply principal component analysis (PCA) to obtain the important apparent motions for both bound and unbound structures. There are significant similarities between the first few key motions and the first few low-frequency normal modes calculated from a static representative structure with an elastic network model (ENM) that is based on the contact matrix >C (related to >D), strongly suggesting that the variations among the observed structures and the corresponding conformational changes are facilitated by the low-frequency, global motions intrinsic to the structure. Similarities are also found when the approach is applied to an NMR ensemble, as well as to atomic molecular dynamics (MD) trajectories. Thus, a sufficiently large number of experimental structures can directly provide important information about protein dynamics, but ENM can also provide a similar sampling of conformations. Finally, we use distance constraints from databases of known protein structures for structure refinement. We use the distributions of distances of various types in known protein structures to obtain the most probable ranges or the mean-force potentials for the distances. We then impose these constraints on structures to be refined or include the mean-force potentials directly in the energy minimization so that more plausible structural models can be built. This approach has been successfully used by us in 2006 in the CASPR structure refinement ().
展开▼
机译:许多结构信息都编码在内部距离中。基于距离矩阵的方法可用于预测蛋白质结构和动力学,以及用于结构优化。我们的方法基于包含蛋白质中残基之间所有平方距离的平方距离矩阵> D strong> = [rij 2 sup>]。该距离矩阵比接触矩阵> C strong>包含更多的信息,接触矩阵的元素为0或1,具体取决于距离rij是大于还是小于截止值rcutoff。距离矩阵展开▼