首页> 外文期刊>Journal of Molecular Biology >The PDB is a covering set of small protein structures.
【24h】

The PDB is a covering set of small protein structures.

机译:PDB是一组覆盖的小蛋白质结构。

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Structure comparisons of all representative proteins have been done. Employing the relative root mean square deviation (RMSD) from native enables the assessment of the statistical significance of structure alignments of different lengths in terms of a Z-score. Two conclusions emerge: first, proteins with their native fold can be distinguished by their Z-score. Second and somewhat surprising, all small proteins up to 100 residues in length have significant structure alignments to other proteins in a different secondary structure and fold class; i.e. 24.0% of them have 60% coverage by a template protein with a RMSD below 3.5A and 6.0% have 70% coverage. If the restriction that we align proteins only having different secondary structure types is removed, then in a representative benchmark set of proteins of 200 residues or smaller, 93% can be aligned to a single template structure (with average sequence identity of 9.8%), with a RMSD less than 4A, and 79% average coverage. In this sense, the current Protein Data Bank (PDB) is almost a covering set of small protein structures. The length of the aligned region (relative to the whole protein length) does not differ among the top hit proteins, indicating that protein structure space is highly dense. For larger proteins, non-related proteins can cover a significant portion of the structure. Moreover, these top hit proteins are aligned to different parts of the target protein, so that almost the entire molecule can be covered when combined. The number of proteins required to cover a target protein is very small, e.g. the top ten hit proteins can give 90% coverage below a RMSD of 3.5A for proteins up to 320 residues long. These results give a new view of the nature of protein structure space, and its implications for protein structure prediction are discussed.
机译:所有代表性蛋白质的结构比较已经完成。通过使用相对于本地的相对均方根偏差(RMSD),可以评估不同长度的Z值结构对齐的统计显着性。得出两个结论:首先,可以通过Z分数来区分具有天然折叠的蛋白质。其次,有些令人惊讶的是,所有长度不超过100个残基的小蛋白质都具有与其他蛋白质不同的二级结构和折叠类别的显着结构比对。也就是说,其中24.0%的模板蛋白的RMSD低于3.5A的模板蛋白覆盖率为60%,而6.0%的模板蛋白的覆盖率为70%。如果取消了我们仅对具有不同二级结构类型的蛋白质进行比对的限制,那么在具有200个残基或更少残基的代表性基准蛋白质组中,可以将93%的蛋白质与单个模板结构进行比对(平均序列同一性为9.8%), RMSD小于4A,平均覆盖率达79%。从这个意义上讲,当前的蛋白质数据库(PDB)几乎涵盖了小蛋白质结构。排列最靠前的蛋白质之间的对齐区域长度(相对于整个蛋白质长度)没有差异,这表明蛋白质结构空间高度密集。对于较大的蛋白质,不相关的蛋白质可以覆盖结构的很大一部分。此外,这些命中率最高的蛋白质与目标蛋白质的不同部分比对,因此结合时几乎可以覆盖整个分子。覆盖目标蛋白质所需的蛋白质数量非常少,例如对于长度长达320个残基的蛋白质,前十个命中的蛋白质可提供低于3.5A的RMSD的90%覆盖率。这些结果为蛋白质结构空间的性质提供了新的观点,并讨论了其对蛋白质结构预测的意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号