首页> 外文学位 >Clustering system and clustering support vector machine for local protein structure prediction.
【24h】

Clustering system and clustering support vector machine for local protein structure prediction.

机译:用于局部蛋白质结构预测的聚类系统和聚类支持向量机。

获取原文
获取原文并翻译 | 示例

摘要

Protein tertiary structure plays a very important role in determining its possible functional sites and chemical interactions with other related proteins. Experimental methods to determine protein structure are time consuming and expensive. As a result, the gap between protein sequence and its structure has widened substantially due to the high throughput sequencing techniques. Problems of experimental methods motivate us to develop the computational algorithms for protein structure prediction.; In this work, the clustering system is used to predict local protein structure. At first, recurring sequence clusters are explored with an improved K-means clustering algorithm. Carefully constructed sequence clusters are used to predict local protein structure. After obtaining the sequence clusters and motifs, we study how sequence variation for sequence clusters may influence its structural similarity.; Analysis of the relationship between sequence variation and structural similarity for sequence clusters shows that sequence clusters with tight sequence variation have high structural similarity and sequence clusters with wide sequence variation have poor structural similarity. Based on above knowledge, the established clustering system is used to predict the tertiary structure for local sequence segments. Test results indicate that highest quality clusters can give highly reliable prediction results and high quality clusters can give reliable prediction results.; In order to improve the performance of the clustering system for local protein structure prediction, a novel computational model called Clustering Support Vector Machines (CSVMs) is proposed. In our previous work, the sequence-to-structure relationship with the K-means algorithm has been explored by the conventional K-means algorithm. The K-means clustering algorithm may not capture nonlinear sequence-to-structure relationship effectively. As a result, we consider using Support Vector Machine (SVM) to capture the nonlinear sequence-to-structure relationship. However, SVM is not favorable for huge datasets including millions of samples. Therefore, we propose a novel computational model called CSVMs. Taking advantage of both the theory of granular computing and advanced statistical learning methodology, CSVMs are built specifically for each information granule partitioned intelligently by the clustering algorithm. Compared with the clustering system introduced previously, our experimental results show that accuracy for local structure prediction has been improved noticeably when CSVMs are applied.
机译:蛋白质三级结构在确定其可能的功能位点以及与其他相关蛋白质的化学相互作用中起着非常重要的作用。确定蛋白质结构的实验方法既耗时又昂贵。结果,由于高通量测序技术,蛋白质序列及其结构之间的间隙已大大加宽。实验方法的问题促使我们开发蛋白质结构预测的计算算法。在这项工作中,聚类系统用于预测局部蛋白质结构。首先,使用改进的K均值聚类算法探索循环序列聚类。精心构建的序列簇可用于预测局部蛋白质结构。获得序列簇和基序后,我们研究序列簇的序列变异如何影响其结构相似性。对序列簇的序列变异与结构相似性之间的关系进行分析表明,具有紧密序列变异的序列簇具有较高的结构相似性,具有较大序列变异的序列簇具有较差的结构相似性。基于上述知识,所建立的聚类系统用于预测局部序列片段的三级结构。测试结果表明,最高质量的聚类可以给出高度可靠的预测结果,而高质量的聚类可以给出可靠的预测结果。为了提高用于局部蛋白质结构预测的聚类系统的性能,提出了一种新型的计算模型,称为聚类支持向量机(CSVM)。在我们以前的工作中,通过传统的K-means算法探索了与K-means算法的序列与结构的关系。 K-均值聚类算法可能无法有效捕获非线性序列与结构的关系。因此,我们考虑使用支持向量机(SVM)捕获非线性序列与结构的关系。但是,SVM不适用于包含数百万个样本的庞大数据集。因此,我们提出了一种称为CSVM的新型计算模型。利用粒计算理论和先进的统计学习方法,针对通过聚类算法智能划分的每个信息颗粒专门构建CSVM。与先前介绍的聚类系统相比,我们的实验结果表明,当使用CSVM时,局部结构预测的准确性已得到显着提高。

著录项

  • 作者

    Zhong, Wei.;

  • 作者单位

    Georgia State University.;

  • 授予单位 Georgia State University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 172 p.
  • 总页数 172
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号