...
首页> 外文期刊>PLoS Computational Biology >Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets, and Homology Models
【24h】

Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets, and Homology Models

机译:现有结构,结构基因组学目标和同源性模型对人类基因组的功能覆盖

获取原文
           

摘要

The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB), target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB), it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the “most wanted list” that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html.
机译:长期以来,人们就认识到由于结构生物学家的实验局限性和蛋白质特定功能类别的靶向性而导致的蛋白质结构和功能空间的偏差,但从未对其进行连续量化。使用酶委员会和基因本体论分类作为参考框架,并整合来自蛋白质数据库(PDB)的结构数据,来自结构基因组计划的靶序列,源自SUPERFAMILY数据库的结构同源性以及来自Ensembl和NCBI的基因组注释,我们提供了在域和全蛋白水平上相对于人类基因组的当前和预期的蛋白质结构和功能空间覆盖范围的量化视图。目前蛋白质结构提供至少一个结构域,该结构域覆盖了基因组中鉴定的功能类别的37%。整个结构覆盖了25%的基因组。如果解决了所有结构基因组学目标(当前PDB中结构数量的两倍),则估计一个域的结构将覆盖所确定功能类别的69%,而完整结构的覆盖率将为44%。现有实验结构的同源性模型将单域的37%覆盖率扩展到基因组的56%,对于完整结构将25%扩展到31%。同源性模型的覆盖范围不是按蛋白质家族均匀分布,反映了家族中不同程度的序列和结构差异。相反,尽管这些数据提供了覆盖范围,但它们还系统地突出了应确定其结构的蛋白质功能类别。当前突出显示了没有结构表示的关键功能族;每周可以从http://function.rcsb.org:8080/pdb/function_distribution/index.html获得有关“最需要的清单”的最新信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号