首页> 外文期刊>Bioinformatics >Clustering of proximal sequence space for the identification of protein families.
【24h】

Clustering of proximal sequence space for the identification of protein families.

机译:用于识别蛋白质家族的近端序列空间的聚类。

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: The study of sequence space, and the deciphering of the structure of protein families and subfamilies, has up to now been required for work in comparative genomics and for the prediction of protein function. With the emergence of structural proteomics projects, it is becoming increasingly important to be able to select protein targets for structural studies that will appropriately cover the space of protein sequences, functions and genomic distribution. These problems are the motivation for the development of methods for clustering protein sequences and building families of potentially orthologous sequences, such as those proposed here. Results: First we developed a clustering strategy (Ncut algorithm) capable of forming groups of related sequences by assessing their pairwise relationships. The results presented for the ras super-family of proteins are similar to those produced by other clustering methods, but without the need for clustering the full sequence space. The Ncut clusters are then used as the input to a process of reconstruction of groups with equilibrated genomic composition formed by closely-related sequences. The results of applying this technique to the data set used in the construction of the COG database are very similar to those derived by the human experts responsible for this database. Availability: The analysis of different systems, including the COG equivalent 21 genomes are available at http://www.pdg.cnb.uam.es/GenoClustering.html Contact: valencia
机译:动机:迄今为止,对于比较基因组学的工作和蛋白质功能的预测,一直需要研究序列空间以及解密蛋白质家族和亚家族的结构。随着结构蛋白质组学项目的出现,能够为结构研究选择适当覆盖蛋白质序列,功能和基因组分布空间的蛋白质靶标变得越来越重要。这些问题是开发蛋白质序列聚类和建立潜在直系同源序列家族的方法的动机,例如本文提出的那些。结果:首先,我们开发了一种聚类策略(Ncut算法),该策略能够通过评估成对相关关系来形成相关序列组。 ras超家族蛋白的结果与其他聚类方法所产生的结果相似,但无需对整个序列空间进行聚类。然后,将Ncut簇用作重建具有紧密相关序列的平衡基因组组成的组的过程的输入。将这种技术应用于构建COG数据库所使用的数据集的结果与负责该数据库的人类专家得出的结果非常相似。可用性:可以在http://www.pdg.cnb.uam.es/GenoClustering.html上获得对不同系统的分析,包括COG等效的21个基因组。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号