首页> 外文期刊>Nucleic Acids Research >A rapid classification protocol for the CATH Domain Database to support structural genomics.
【24h】

A rapid classification protocol for the CATH Domain Database to support structural genomics.

机译:CATH域数据库的快速分类协议,可支持结构基因组学。

获取原文
获取原文并翻译 | 示例
           

摘要

In order to support the structural genomic initiatives, both by rapidly classifying newly determined structures and by suggesting suitable targets for structure determination, we have recently developed several new protocols for classifying structures in the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath). These aim to increase the speed of classification of new structures using fast algorithms for structure comparison (GRATH) and to improve the sensitivity in recognising distant structural relatives by incorporating sequence information from relatives in the genomes (DomainFinder). In order to ensure the integrity of the database given the expected increase in data, the CATH Protein Family Database (CATH-PFDB), which currently includes 25,320 structural domains and a further 160,000 sequence relatives has now been installed in a relational ORACLE database. This was essential for developing more rigorous validation procedures and for allowing efficient querying of the database, particularly for genome analysis. The associated Dictionary of Homologous Superfamilies [Bray,J.E., Todd,A.E., Pearl,F.M.G., Thornton,J.M. and Orengo,C.A. (2000) Protein Eng., 13, 153-165], which provides multiple structural alignments and functional information to assist in assigning new relatives, has also been expanded recently and now includes information for 903 homologous superfamilies. In order to improve coverage of known structures, preliminary classification levels are now provided for new structures at interim stages in the classification protocol. Since a large proportion of new structures can be rapidly classified using profile-based sequence analysis [e.g. PSI-BLAST: Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402], this provides preliminary classification for easily recognisable homologues, which in the latest release of CATH (version 1.7) represented nearly three-quarters of the non-identical structures.
机译:为了支持结构基因组计划,通过快速分类新确定的结构并建议合适的结构确定目标,我们最近开发了几种新的协议,用于在CATH域数据库中对结构进行分类(http://www.biochem.ucl .ac.uk / bsm / cath)。这些旨在通过使用结构比较快速算法(GRATH)提高新结构分类的速度,并通过将来自基因组亲戚的序列信息纳入基因组(DomainFinder)来提高识别远处结构亲戚的敏感性。为了在预期的数据增加的情况下确保数据库的完整性,CATH蛋白家族数据库(CATH-PFDB)目前已包含25,320个结构域,并且在关系型ORACLE数据库中已安装了另外160,000个序列亲戚。这对于开发更严格的验证程序以及允许高效查询数据库(尤其是基因组分析)至关重要。相关的同源超家族词典[Bray,J.E。,Todd,A.E。,Pearl,F.M.G。,Thornton,J.M。和Orengo,C.A。 (2000)Protein Eng。,13,153-165],它提供了多个结构比对和功能信息以帮助分配新的亲戚,最近也得到了扩展,现在包括903个同源超家族的信息。为了提高已知结构的覆盖范围,现在在分类协议的过渡阶段为新结构提供了初步的分类级别。由于可以使用基于配置文件的序列分析快速对大部分新结构进行分类[例如, PSI-BLAST:Altschul,S.F.,Madden,TL。,Schaffer,A.A.,Zhang,J.,Zhang,Z.,Miller,W。和Lipman,D.J。 (1997)Nucleic Acids Res。,25,3389-3402],这为易于识别的同系物提供了初步分类,在最新版本的CATH(1.7版)中,它代表了几乎四分之三的不相同结构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号