首页> 外文OA文献 >SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale
【2h】

SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale

机译:SCPS:一种光谱方法的快速实现,可在全基因组范围内检测蛋白质家族

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

BackgroundAn important problem in genomics is the automatic inference of groups of homologous proteins from pairwise sequence similarities. Several approaches have been proposed for this task which are "local" in the sense that they assign a protein to a cluster based only on the distances between that protein and the other proteins in the set. It was shown recently that global methods such as spectral clustering have better performance on a wide variety of datasets. However, currently available implementations of spectral clustering methods mostly consist of a few loosely coupled Matlab scripts that assume a fair amount of familiarity with Matlab programming and hence they are inaccessible for large parts of the research community.ResultsSCPS (Spectral Clustering of Protein Sequences) is an efficient and user-friendly implementation of a spectral method for inferring protein families. The method uses only pairwise sequence similarities, and is therefore practical when only sequence information is available. SCPS was tested on difficult sets of proteins whose relationships were extracted from the SCOP database, and its results were extensively compared with those obtained using other popular protein clustering algorithms such as TribeMCL, hierarchical clustering and connected component analysis. We show that SCPS is able to identify many of the family/superfamily relationships correctly and that the quality of the obtained clusters as indicated by their F-scores is consistently better than all the other methods we compared it with. We also demonstrate the scalability of SCPS by clustering the entire SCOP database (14,183 sequences) and the complete genome of the yeast Saccharomyces cerevisiae (6,690 sequences).ConclusionsBesides the spectral method, SCPS also implements connected component analysis and hierarchical clustering, it integrates TribeMCL, it provides different cluster quality tools, it can extract human-readable protein descriptions using GI numbers from NCBI, it interfaces with external tools such as BLAST and Cytoscape, and it can produce publication-quality graphical representations of the clusters obtained, thus constituting a comprehensive and effective tool for practical research in computational biology. Source code and precompiled executables for Windows, Linux and Mac OS X are freely available at http://www.paccanarolab.org/software/scps webcite.
机译:背景技术基因组学中的一个重要问题是从成对的序列相似性自动推断同源蛋白的组。已经提出了几种用于该任务的方法,这些方法是“局部的”,因为它们仅基于蛋白质与该组中其他蛋白质之间的距离将蛋白质分配给簇。最近显示,诸如频谱聚类之类的全局方法在各种数据集上具有更好的性能。但是,目前可用的光谱聚类方法实现主要由一些松散耦合的Matlab脚本组成,这些脚本假定对Matlab编程相当熟悉,因此对于大多数研究团体来说都是无法访问的。结果SCPS(蛋白质序列的光谱聚类)为一种有效且用户友好的光谱方法来推断蛋白质家族。该方法仅使用成对的序列相似性,因此在仅序列信息可用时是实用的。对SCPS进行了困难蛋白质组的测试,这些蛋白质的关系从SCOP数据库中提取,并将其结果与使用其他流行的蛋白质聚类算法(如TribeMCL,层次聚类和连接的成分分析)获得的结果进行了广泛比较。我们表明,SCPS能够正确识别许多家庭/超家庭关系,并且以F分数表示的获得簇的质量始终优于我们与之相比的所有其他方法。我们还通过将整个SCOP数据库(14,183个序列)和酿酒酵母的完整基因组(6,690个序列)进行聚类来证明SCPS的可扩展性。结论除光谱方法外,SCPS还实现了连通成分分析和层次聚类,它集成了TribeMCL,它提供了不同的簇质量工具,可以使用NCBI中的GI编号提取人类可读的蛋白质描述,可以与外部工具(如BLAST和Cytoscape)交互,并且可以生成获得的簇的出版质量的图形表示,从而构成了一个全面的以及用于计算生物学实践研究的有效工具。可从http://www.paccanarolab.org/software/scps网站免费获得Windows,Linux和Mac OS X的源代码和预编译的可执行文件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号