首页> 外文期刊>BMC Bioinformatics >Building multiclass classifiers for remote homology detection and fold recognition
【24h】

Building multiclass classifiers for remote homology detection and fold recognition

机译:构建用于远程同源性检测和折叠识别的多分类器

获取原文
           

摘要

Background Protein remote homology detection and fold recognition are central problems in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for solving these problems. These methods are primarily used to solve binary classification problems and they have not been extensively used to solve the more general multiclass remote homology prediction and fold recognition problems. Results We present a comprehensive evaluation of a number of methods for building SVM-based multiclass classification schemes in the context of the SCOP protein classification. These methods include schemes that directly build an SVM-based multiclass model, schemes that employ a second-level learning approach to combine the predictions generated by a set of binary SVM-based classifiers, and schemes that build and combine binary classifiers for various levels of the SCOP hierarchy beyond those defining the target classes. Conclusion Analyzing the performance achieved by the different approaches on four different datasets we show that most of the proposed multiclass SVM-based classification approaches are quite effective in solving the remote homology prediction and fold recognition problems and that the schemes that use predictions from binary models constructed for ancestral categories within the SCOP hierarchy tend to not only lead to lower error rates but also reduce the number of errors in which a superfamily is assigned to an entirely different fold and a fold is predicted as being from a different SCOP class. Our results also show that the limited size of the training data makes it hard to learn complex second-level models, and that models of moderate complexity lead to consistently better results.
机译:背景技术蛋白质远程同源性检测和折叠识别是计算生物学中的中心问题。目前,基于支持向量机的监督学习算法是解决这些问题的最有效方法之一。这些方法主要用于解决二进制分类问题,尚未广泛用于解决更一般的多类远程同源性预测和折叠识别问题。结果我们在SCOP蛋白质分类的背景下,对构建基于SVM的多类别分类方案的许多方法进行了全面评估。这些方法包括直接构建基于SVM的多类模型的方案,采用第二级学习方法来组合由一组基于二进制SVM的分类器生成的预测的方案以及为各个级别的SVM构建和组合二进制分类器的方案。 SCOP层次结构超出了定义目标类的层次结构。结论分析不同方法在四个不同数据集上获得的性能后,我们发现,大多数提议的基于多类SVM的分类方法在解决远程同源性预测和折叠识别问题方面非常有效,并且使用了从二进制模型构建的预测的方案SCOP层次结构中的祖先类别不仅会导致较低的错误率,而且会减少将超家族分配给完全不同的折叠并预测折叠来自不同SCOP类的错误的数量。我们的结果还表明,训练数据的数量有限,很难学习复杂的第二级模型,而中等复杂性的模型则可以始终如一地获得更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号