...
首页> 外文期刊>Evolutionary Bioinformatics >Clustering Protein Sequences Using Affinity Propagation Based on an Improved Similarity Measure
【24h】

Clustering Protein Sequences Using Affinity Propagation Based on an Improved Similarity Measure

机译:基于改进的相似性度量,使用亲和力传播对蛋白质序列进行聚类

获取原文
           

摘要

The sizes of the protein databases are growing rapidly nowadays, thus it becomes increasingly important to cluster protein sequences only based on sequence information. In this paper we improve the similarity measure proposed by Kelil et al, then cluster sequences using the Affinity propagation (AP) algorithm and provide a method to decide the input preference of AP algorithm. We tested our method extensively and compared its performance with other four methods on several datasets of COG, G protein, CAZy, SCOP database. We consistently observed that, the number of clusters that we obtained for a given set of proteins approximate to the correct number of clusters in that set. Moreover, in our experiments, the quality of the clusters when quantified by F-measure was better than that of other algorithms (on average, it is 15% better than that of BlastClust, 56% better than that of TribeMCL, 23% better than that of CLUSS, and 42% better than that of Spectral clustering).
机译:如今,蛋白质数据库的大小正在迅速增长,因此仅基于序列信息对蛋白质序列进行聚簇变得越来越重要。在本文中,我们改进了Kelil等人提出的相似性度量,然后使用亲和传播(AP)算法对聚类序列进行了分析,并提供了一种确定AP算法输入偏好的方法。我们对我们的方法进行了广泛的测试,并将其性能与其他四种方法在COG,G蛋白,CAZy,SCOP数据库的几个数据集上进行了比较。我们始终观察到,对于给定的一组蛋白质,我们获得的簇数近似于该组蛋白质中正确的簇数。此外,在我们的实验中,通过F度量量化的聚类质量优于其他算法(平均而言,其质量比BlastClust好15%,比TribeMCL好56%,比TribeMCL好23% CLUSS的性能指标,比“光谱”聚类的指标好42%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号