首页> 外文OA文献 >UPSEC : an algorithm for classifying unaligned protein sequences into functional families
【2h】

UPSEC : an algorithm for classifying unaligned protein sequences into functional families

机译:UPSEC:一种将未比对的蛋白质序列分类为功能家族的算法

摘要

To classify proteins into functional families based on their primary sequences, popular algorithms such as the k-NN-, HMM-, and SVM-based algorithms are often used. For many of these algorithms to perform their tasks, protein sequences need to be properly aligned first. Since the alignment process can be error-prone, protein classification may not be performed very accurately. To improve classification accuracy, we propose an algorithm, called the Unaligned Protein SEquence Classifier (UPSEC), which can perform its tasks without sequence alignment. UPSEC makes use of a probabilistic measure to identify residues that are useful for classification in both positive and negative training samples, and can handle multi-class classification with a single classifier and a single pass through the training data. UPSEC has been tested with real protein data sets. Experimental results show that UPSEC can effectively classify unaligned protein sequences into their corresponding functional families, and the patterns it discovers during the training process can be biologically meaningful.
机译:为了基于蛋白质的一级序列将蛋白质分类为功能家族,经常使用流行的算法,例如基于k-NN,HMM和SVM的算法。为了使这些算法中的许多算法能够执行其任务,首先需要正确比对蛋白质序列。由于比对过程可能容易出错,因此可能无法非常准确地进行蛋白质分类。为了提高分类的准确性,我们提出了一种称为未比对蛋白质序列分类器(UPSEC)的算法,该算法无需序列比对即可执行其任务。 UPSEC利用概率度量来识别可用于正训练样本和负训练样本中的分类的残基,并且可以使用单个分类器和单次通过训练数据来处理多分类。 UPSEC已通过真实蛋白质数据集进行了测试。实验结果表明,UPSEC可以有效地将未比对的蛋白质序列分为相应的功能家族,并且在训练过程中发现的模式可能具有生物学意义。

著录项

  • 作者

    Ma PCH; Chan KCC;

  • 作者单位
  • 年度 2008
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号