首页> 外文会议>International Conference on Intelligent Systems for Molecular Biology >Towards a complete map of the protein space based on a unified sequence and structure analysis of all known proteins
【24h】

Towards a complete map of the protein space based on a unified sequence and structure analysis of all known proteins

机译:基于所有已知蛋白质的统一序列和结构分析,朝向蛋白质空间的完整地图

获取原文

摘要

In search for global principles that may explain the organization of the space of all possible proteins, we study all known protein sequences and structures. In this paper we present a global map of the protein space based on our analysis. Our protein space contains all protein sequences in a non-redundant (NR) database, which includes all major sequence databases. Using the PSI-BLAST procedure we defined 4670 clusters of related sequences in this space. Of these clusters, 1421 are centered on a sequence of known structure. All 4670 clusters were then compared using either a structure metric (when 3D structures are known) or a novel sequence profile metric. These scores were used to define a unified and consistent metric between all clusters. Two schemes were employed to organize these clusters in a meta-organization. The first uses a graph theory method and cluster the clusters in an hierarchical organization. This organization extends our ability to predict the structure and function of many proteins beyond what is possible with existing tools for sequence analysis. The second uses a variation on a multidimensional scaling technique to embed the clusters in a low dimensional real space. This last approach resulted in a projection of the protein space onto a 2D plane that provides us with a bird's eye view of the protein space. Based on this map we suggest a list of possible target sequences with unknown structure that are likely to adopt new, unknown folds.
机译:在寻找可以解释所有可能蛋白质的空间组织的全球原则,我们研究所有已知的蛋白质序列和结构。在本文中,我们基于我们的分析提出了蛋白质空间的全球地图。我们的蛋白质空间包含非冗余(NR)数据库中的所有蛋白质序列,包括所有主要序列数据库。使用PSI-Blast程序我们在此空间中定义了4670个相关序列的群集。在这些簇中,1421以已知结构的序列为中心。然后使用结构度量(当已知3D结构时)或新颖的序列轮廓度量来进行比较所有4670个簇。这些分数用于定义所有集群之间的统一和一致的度量。采用两种计划在元组织中组织这些簇。第一个使用图形理论方法并在分层组织中群集群集。该组织扩展了我们预测许多蛋白质的结构和功能,超出了现有的序列分析工具的可能性。第二个利用多维缩放技术的变化来嵌入低维实际空间中的簇。最后一种方法导致蛋白质空间投射到2D平面上,该平面为我们提供鸟瞰蛋白质空间的眼睛视图。基于该地图,我们建议一个可能的目标序列列表,具有未知结构,可能采用新的未知折叠。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号