首页> 外文期刊>Journal of Biomedical Science and Engineering >FastCluster: a graph theory based algorithm for removing redundant sequences
【24h】

FastCluster: a graph theory based algorithm for removing redundant sequences

机译:FastCluster:一种基于图论的算法,用于去除冗余序列

获取原文
       

摘要

In many cases, biological sequence databases contain redundant sequences that make it difficult to achieve reliable statistical analysis. Removing the redundant sequences to find all the real protein families and their representatives from a large sequences dataset is quite important in bioinformatics. The problem of removing redundant protein sequences can be modeled as finding the maximum independent set from a graph, which is a NP problem in Mathematics. This paper presents a novel program named FastCluster on the basis of mathematical graph theory. The algorithm makes an improvement to Hobohm and Sander’s algorithm to generate non-redundant protein sequence sets. FastCluster uses BLAST to determine the similarity between two sequences in order to get better sequence similarity. The algorithm’s performance is compared with Hobohm and Sander’s algorithm and it shows that Fast- Cluster can produce a reasonable non-redundant pro- tein set and have a similarity cut-off from 0.0 to 1.0. The proposed algorithm shows its superiority in generating a larger maximal non-redundant (independent) protein set which is closer to the real result (the maximum independent set of a graph) that means all the protein families are clustered. This makes Fast- Cluster a valuable tool for removing redundant protein sequences.
机译:在许多情况下,生物序列数据库包含冗余序列,这使得难以实现可靠的统计分析。在生物信息学中,删除冗余序列以从大型序列数据集中找到所有真实的蛋白质家族及其代表非常重要。删除冗余蛋白质序列的问题可以建模为从图中找到最大独立集,这是数学中的NP问题。本文基于数学图论提出了一种名为FastCluster的新颖程序。该算法对Hobohm和Sander的算法进行了改进,以生成非冗余蛋白序列集。 FastCluster使用BLAST确定两个序列之间的相似性,以获得更好的序列相似性。将该算法的性能与Hobohm和Sander的算法进行了比较,结果表明Fast-Cluster可以产生合理的非冗余蛋白集,并且相似度截止值从0.0到1.0。所提出的算法在生成更大的最大非冗余(独立)蛋白质组方面表现出优势,该蛋白质组更接近于实际结果(图形的最大独立组),这意味着所有蛋白质家族都可以聚类。这使Fast-Cluster成为去除冗余蛋白质序列的有价值的工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号