首页> 美国卫生研究院文献>Protein Science : A Publication of the Protein Society >An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences
【2h】

An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences

机译:蛋白质宇宙中功能相关的聚类方法:基于活动位点的蛋白质结构和序列聚类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two‐Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure‐Function Linkage Database, SFLD) self‐identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self‐identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well‐curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP‐identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F‐measure and performance analysis on the enolase search results and comparison to GEMMA and SCI‐PHY demonstrate that TuLIP avoids the over‐division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results.
机译:蛋白质功能鉴定仍然是一个重大问题。在分子功能水平上解决此问题将允许进行机械的决定因素鉴定,即区分超家族功能家族之间细节的氨基酸。主动站点分析被开发出来,以识别机制的决定因素。 DASP和DASP2被开发为使用主动位点分析来搜索序列数据库的工具。在这里,引入TuLIP(两级迭代聚类过程)作为一种迭代的,分裂性的聚类过程,该过程利用活动站点配置文件将结构特征化的超家族成员分为功能相关的聚类。 TuLIP的基础是在DASP2搜索中可以自动识别功能相关的族(由结构功能链接数据库,SFLD策划)的观察结果;包含多个功能族的集群则没有。每个TuLIP迭代都会生成候选簇,每个候选簇都经过评估以确定是否使用DASP2进行自我识别。如果是这样,则将其视为功能相关的组。分裂式聚类持续进行,直到每个结构成为功能相关的组成员或单重态为止。 TuLIP已通过烯醇化酶和谷胱甘肽转移酶结构(SFLD精心策划的超家族)验证。相关性很强;少量结构会阻止统计上的重大分析。在DASP2 GenBank搜索中使用TuLIP识别的烯醇酶簇来识别共享功能位点特征的序列。分析显示真实阳性率为96%,错误阴性率为4%,最大错误阳性率为4%。对烯醇酶搜索结果的F量度和性能分析以及与GEMMA和SCI-PHY的比较表明,TuLIP避免了这些方法的过度划分问题。烯醇化酶家族的机制决定因素进行了评估,并显示与文献结果很好相关。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号