Clustering of proximal sequence space for the identification of protein families.

Abascal F; Valencia A

首页> 外文期刊>Bioinformatics >Clustering of proximal sequence space for the identification of protein families.

【24h】

Clustering of proximal sequence space for the identification of protein families.

机译：用于识别蛋白质家族的近端序列空间的聚类。

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Motivation: The study of sequence space, and the deciphering of the structure of protein families and subfamilies, has up to now been required for work in comparative genomics and for the prediction of protein function. With the emergence of structural proteomics projects, it is becoming increasingly important to be able to select protein targets for structural studies that will appropriately cover the space of protein sequences, functions and genomic distribution. These problems are the motivation for the development of methods for clustering protein sequences and building families of potentially orthologous sequences, such as those proposed here. Results: First we developed a clustering strategy (Ncut algorithm) capable of forming groups of related sequences by assessing their pairwise relationships. The results presented for the ras super-family of proteins are similar to those produced by other clustering methods, but without the need for clustering the full sequence space. The Ncut clusters are then used as the input to a process of reconstruction of groups with equilibrated genomic composition formed by closely-related sequences. The results of applying this technique to the data set used in the construction of the COG database are very similar to those derived by the human experts responsible for this database. Availability: The analysis of different systems, including the COG equivalent 21 genomes are available at http://www.pdg.cnb.uam.es/GenoClustering.html Contact: valencia

机译：动机：迄今为止，对于比较基因组学的工作和蛋白质功能的预测，一直需要研究序列空间以及解密蛋白质家族和亚家族的结构。随着结构蛋白质组学项目的出现，能够为结构研究选择适当覆盖蛋白质序列，功能和基因组分布空间的蛋白质靶标变得越来越重要。这些问题是开发蛋白质序列聚类和建立潜在直系同源序列家族的方法的动机，例如本文提出的那些。结果：首先，我们开发了一种聚类策略（Ncut算法），该策略能够通过评估成对相关关系来形成相关序列组。 ras超家族蛋白的结果与其他聚类方法所产生的结果相似，但无需对整个序列空间进行聚类。然后，将Ncut簇用作重建具有紧密相关序列的平衡基因组组成的组的过程的输入。将这种技术应用于构建COG数据库所使用的数据集的结果与负责该数据库的人类专家得出的结果非常相似。可用性：可以在http://www.pdg.cnb.uam.es/GenoClustering.html上获得对不同系统的分析，包括COG等效的21个基因组。

著录项

来源
《Bioinformatics》 |2002年第7期|共14页
作者
Abascal F; Valencia A;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类生物科学;生物工程学（生物技术）;
关键词
Proteins; Databases; Motivation; Cluster Analysis; 蛋白质类; 数据库; 动机; 聚类分析;

机译：Proteins;Databases;Motivation;Cluster Analysis;蛋白质类;数据库;动机;聚类分析;

相似文献

外文文献
中文文献
专利

1. Clustering of proximal sequence space for the identification of protein families. [J] . Abascal F, Valencia A Bioinformatics . 2002,第7期

机译：用于识别蛋白质家族的近端序列空间的聚类。
2. Survey of clustered regularly interspaced short palindromic repeats and their associated Cas proteins (CRISPR/Cas) systems in multiple sequenced strains of Klebsiella pneumoniae [J] . Martha Lorena Ostria-Hernández, Carlos Javier Sánchez-Vallejo, J Antonio Ibarra, BMC research notes . 2015,第1期

机译：肺炎克雷伯菌多重测序菌株中簇状规则间隔的短回文重复序列及其相关Cas蛋白（CRISPR / Cas）系统的调查
3. Return to sender: use of Plasmodium ER retrieval sequences to study protein transport in the infected erythrocyte and predict putative ER protein families. [J] . Kulzer S, Gehde N, Przyborski JM Parasitology Research . 2009,第6期

机译：返回寄件人：使用疟原虫ER检索序列研究感染的红细胞中的蛋白运输并预测推定的ER蛋白家族。
4. Identification of Proteins in Unsequenced Bacterial Strains Via Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry Identification of Proteins in Unsequenced Bacterial Strains Via Matrix-Assisted Laser Desorption/Ionization Mass Spectromet [C] . Daisy-Malloy Hamburg, Moo-Jin Suh, Patrick A. Limbach International Symposium on Runaway Reactions, Pressure Relief Design and Effluent Handling American Institute of Chemical Engineers/American Chemical Society Management Conference . 2005

机译：通过基质辅助激光解吸/电离质谱法通过基质辅助激光解吸/电离质谱鉴定通过基质辅助激光解吸/电离质谱鉴定蛋白质中的蛋白质中蛋白质的鉴定
5. Greedy-Proximal A* and Hybrid Spectral/Subspace Clustering for Molecular Dynamics Simulations [D] . Syzonenko, Ivan . 2019

机译：用于分子动力学模拟的贪婪 - 近端A *和混合谱/子空间聚类
6. Survey of clustered regularly interspaced short palindromic repeats and their associated Cas proteins (CRISPR/Cas) systems in multiple sequenced strains of Klebsiella pneumoniae [O] . Martha Lorena Ostria-Hernández, Carlos Javier Sánchez-Vallejo, J Antonio Ibarra, 2015

机译：肺炎克雷伯菌多重测序菌株中簇状规则间隔的短回文重复序列及其相关Cas蛋白（CRISPR / Cas）系统的调查
7. Survey of clustered regularly interspaced short palindromic repeats and their associated Cas proteins (CRISPR/Cas) systems in multiple sequenced strains of Klebsiella pneumoniae [O] . Martha Lorena Ostria-Hernández, Carlos Javier Sánchez-Vallejo, J Antonio Ibarra, 2015

机译：肺炎克雷伯菌多重测序菌株中簇状规则间隔的短回文重复序列及其相关Cas蛋白（CRISPR / Cas）系统的调查

Clustering of proximal sequence space for the identification of protein families.

摘要

著录项

相似文献

相关主题

期刊订阅