首页> 美国卫生研究院文献>BMC Bioinformatics >CLUSS: Clustering of protein sequences based on a new similarity measure
【2h】

CLUSS: Clustering of protein sequences based on a new similarity measure

机译:CLUSS:基于新的相似性度量的蛋白质序列聚类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

BackgroundThe rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. In the rest of the paper, we use the term "phylogenetic" in the sense of "relatedness of biological functions".
机译:背景技术可用蛋白质数据的迅速发展使得在蛋白质家族中使用聚类变得越来越重要。面临的挑战是确定进化相关序列的亚家族。这种鉴定揭示了系统发育关系,这些关系提供了先验知识,以帮助研究人员了解生物学现象。良好的进化模型对于实现反映生物学现实的聚簇至关重要,而蛋白质序列相似性的准确估算对于建立这种模型至关重要。大多数现有算法使用不一定在生物学上似乎合理的技术来估计这种相似性,尤其是对于难于比对的序列,例如具有不同域结构的蛋白质,这会给比对依赖性算法带来很多困难。在本文中,我们提出了一种基于匹配氨基酸亚序列的新型相似性度量。这项名为“替代匹配相似性”的SMS的措施是专门为应用于未比对的蛋白质序列而设计的。它使我们能够开发一种新的无比对算法,称为CLUSS,用于聚类蛋白质家族。据我们所知,这是第一个用于蛋白质序列聚类的无比对算法。与其他聚类算法不同,CLUSS对可比对和不可比对的蛋白家族均有效。在本文的其余部分,我们在“生物学功能的相关性”的意义上使用术语“系统发生”。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号