首页> 美国卫生研究院文献>Nucleic Acids Research >SUPFAM—a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes
【2h】

SUPFAM—a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes

机译:SUPFAM-通过比较基于序列的家族和基于结构的家族获得的潜在蛋白质超家族关系的数据库:对基因组结构基因组学和功能注释的影响

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Members of a superfamily of proteins could result from divergent evolution of homologues with insignificant similarity in the amino acid sequences. A superfamily relationship is detected commonly after the three-dimensional structures of the proteins are determined using X-ray analysis or NMR. The SUPFAM database described here relates two homologous protein families in a multiple sequence alignment database of either known or unknown structure. The present release (1.1), which is the first version of the SUPFAM database, has been derived by analysing Pfam, which is one of the commonly used databases of multiple sequence alignments of homologous proteins. The first step in establishing SUPFAM is to relate Pfam families with the families in PALI, which is an alignment database of homologous proteins of known structure that is derived largely from SCOP. The second step involves relating Pfam families which could not be associated reliably with a protein superfamily of known structure. The profile matching procedure, IMPALA, has been used in these steps. The first step resulted in identification of 1280 Pfam families (out of 2697, i.e. 47%) which are related, either by close homologous connection to a SCOP family or by distant relationship to a SCOP family, potentially forming new superfamily connections. Using the profiles of 1417 Pfam families with apparently no structural information, an all-against-all comparison involving a sequence-profile match using IMPALA resulted in clustering of 67 homologous protein families of Pfam into 28 potential new superfamilies. Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying ‘priority proteins’ for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space. For example, we could assign 858 distinct Pfam domains in 2203 of the gene products in the genome of Mycobacterium tubercolosis. Fifty-one of these Pfam families of unknown structure could be clustered into 17 potentially new superfamilies forming good targets for structural genomics. SUPFAM database can be accessed at http://pauling.mbu.iisc.ernet.in/~supfam.
机译:蛋白质超家族的成员可能是由于氨基酸序列无明显相似性的同源物的不同进化所致。通常使用X射线分析或NMR确定蛋白质的三维结构后,才检测到超家族关系。这里描述的SUPFAM数据库涉及已知或未知结构的多序列比对数据库中的两个同源蛋白家族。本发行版(1.1)是SUPFAM数据库的第一个版本,它是通过分析Pfam衍生而来的,Pfam是同源蛋白多序列比对的常用数据库之一。建立SUPFAM的第一步是将Pfam家族与PALI中的家族联系起来,PALI是一个已知结构的同源蛋白的比对数据库,该同源蛋白主要来源于SCOP。第二步涉及将不能与已知结构的蛋白质超家族可靠地关联的Pfam家族。这些步骤中使用了概要文件匹配过程IMPALA。第一步是鉴定1280个Pfam家族(在2697个中,即47%),这些家族是通过与SCOP家族的密切同源联系或与SCOP家族的远亲联系而建立的,有可能形成新的超家族联系。使用没有结构信息的1417个Pfam家族的概况,涉及使用IMPALA进行序列-概况匹配的所有方面的比较导致将67个Pfam同源蛋白家族聚类为28个潜在的新超家族。 SUPFAM中提议的扩展结构信息未知的相关蛋白质的种类,应有助于在结构基因组学计划中确定“优先蛋白质”以用于结构确定,以扩大结构信息在蛋白质序列空间中的覆盖范围。例如,我们可以在结核分枝杆菌基因组的2203个基因产物中分配858个不同的Pfam结构域。这些未知结构的Pfam家族中的五十一个可以聚集成17个潜在的新超家族,从而形成结构基因组学的良好靶标。可以从http://pauling.mbu.iisc.ernet.in/~supfam访问SUPFAM数据库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号