...
首页> 外文期刊>Journal of computational biology: A journal of computational molecular cell biology >Supervised Protein Family Classification and New Family Construction
【24h】

Supervised Protein Family Classification and New Family Construction

机译:有监督的蛋白质家族分类和新的家族构建

获取原文
获取原文并翻译 | 示例
           

摘要

The goal of protein family classification is to group proteins into families so that proteins within the same family have common function or are related by ancestry. While supervised classification algorithms are available for this purpose, most of these approaches focus on assigning unclassified proteins to known families but do not allow for progressive construction of new families from proteins that cannot be assigned. Although unsupervised clustering algorithms are also available, they do not make use of information from known families. By computing similarities between proteins based on pairwise sequence comparisons, we develop supervised classification algorithms that achieve improved accuracy over previous approaches while allowing for construction of new families. We show that our algorithm has higher accuracy rate and lower mis-classification rate when compared to algorithms that are based on the use of multiple sequence alignments and hidden Markov models, and our algorithm performs well even on families with very few proteins and on families with low sequence similarity. A software program implementing the algorithm (SClassify) is available online (http://faculty.cse.tamu.edu/shsze/sclassify).
机译:蛋白质家族分类的目的是将蛋白质分组,以使同一家族中的蛋白质具有共同的功能或由祖先联系起来。尽管有监督的分类算法可用于此目的,但这些方法大多数集中于将未分类的蛋白质分配给已知的家族,但不允许从无法分配的蛋白质逐步构建新的家族。尽管也可以使用无监督的聚类算法,但是它们没有利用已知族的信息。通过基于成对的序列比较计算蛋白质之间的相似性,我们开发了监督分类算法,该算法比以前的方法具有更高的准确性,同时允许构建新的家族。我们表明,与基于多序列比对和隐马尔可夫模型的算法相比,我们的算法具有更高的准确率和更低的误分类率,并且即使在蛋白质很少的家族和低序列相似性。在线(http://faculty.cse.tamu.edu/shsze/sclassify)上提供了实现该算法(SClassify)的软件程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号