首页> 外文期刊>Expert Systems with Application >Clustering support vector machines for protein local structure prediction
【24h】

Clustering support vector machines for protein local structure prediction

机译:聚类支持向量机用于蛋白质局部结构预测

获取原文
获取原文并翻译 | 示例
           

摘要

Understanding the sequence-to-structure relationship is a central task in bioinformatics research. Adequate knowledge about this relationship can potentially improve accuracy for local protein structure prediction. One of approaches for protein local structure prediction uses the conventional clustering algorithms to capture the sequence-to-structure relationship. The cluster membership function defined by conventional clustering algorithms may not reveal the complex nonlinear relationship adequately. Compared with the conventional clustering algorithms, Support Vector Machine (SVM) can capture the nonlinear sequence-to-structure relationship by mapping the input space into another higher dimensional feature space. However, SVM is not favorable for huge datasets including millions of samples. Therefore, we propose a novel computational model called Clustering Support Vector Machines (CSVMs). Taking advantage of both theory of granular computing and advanced statistical learning methodology, CSVMs are built specifically for each information granule partitioned intelligently by the clustering algorithm. This feature makes learning tasks for each CSVM more specific and simpler. CSVMs modeled for each granule can be easily parallelized so that CSVMs can be used to handle complex classification problems for huge datasets. Average accuracy for CSVMs is over 80%, which indicates that the generalization power for CSVMs is strong enough to recognize the complicated pattern of sequence-to-structure relationships. Compared with the conventional clustering algorithm, our experimental results show that accuracy for local structure prediction has been improved noticeably when CSVMs are applied.
机译:了解序列与结构的关系是生物信息学研究的中心任务。有关此关系的足够知识可以潜在地提高局部蛋白质结构预测的准确性。蛋白质局部结构预测的方法之一是使用常规聚类算法来捕获序列与结构的关系。传统聚类算法定义的聚类隶属度函数可能无法充分揭示复杂的非线性关系。与传统的聚类算法相比,支持向量机(SVM)可以通过将输入空间映射到另一个高维特征空间来捕获非线性序列与结构的关系。但是,SVM不适用于包含数百万个样本的庞大数据集。因此,我们提出了一种新颖的计算模型,称为聚类支持向量机(CSVM)。利用粒计算理论和先进的统计学习方法,针对通过聚类算法智能划分的每个信息颗粒专门构建CSVM。此功能使每个CSVM的学习任务更加具体和简单。为每个颗粒建模的CSVM可以轻松并行化,因此CSVM可以用于处理庞大数据集的复杂分类问题。 CSVM的平均准确性超过80%,这表明CSVM的泛化能力足以识别序列与结构之间关系的复杂模式。与传统的聚类算法相比,我们的实验结果表明,当使用CSVM时,局部结构预测的准确性已得到显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号