首页> 外文OA文献 >Protein secondary structure prediction using neural networks and support vector machines
【2h】

Protein secondary structure prediction using neural networks and support vector machines

机译:使用神经网络和支持向量机预测蛋白质二级结构

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Predicting the secondary structure of proteins is important in biochemistry because the 3D structure can be determined from the local folds that are found in secondary structures. Moreover, knowing the tertiary structure of proteins can assist in determining their functions. The objective of this thesis is to compare the performance of Neural Networks (NN) and Support Vector Machines (SVM) in predicting the secondary structure of 62 globular proteins from their primary sequence. For each NN and SVM, we created six binary classifiers to distinguish between the classes’ helices (H) strand (E), and coil (C). For NN we use Resilient Backpropagation training with and without early stopping. We use NN with either no hidden layer or with one hidden layer with 1,2,...,40 hidden neurons. For SVM we use a Gaussian kernel with parameter fixed at = 0.1 and varying cost parameters C in the range [0.1,5]. 10- fold cross-validation is used to obtain overall estimates for the probability of making a correct prediction. Our experiments indicate for NN and SVM that the different binary classifiers have varying accuracies: from 69% correct predictions for coils vs. non-coil up to 80% correct predictions for stand vs. non-strand. It is further demonstrated that NN with no hidden layer or not more than 2 hidden neurons in the hidden layer are sufficient for better predictions. For SVM we show that the estimated accuracies do not depend on the value of the cost parameter. As a major result, we will demonstrate that the accuracy estimates of NN and SVM binary classifiers cannot distinguish. This contradicts a modern belief in bioinformatics that SVM outperforms other predictors.
机译:预测蛋白质的二级结构在生物化学中很重要,因为3D结构可以从二级结构中发现的局部折叠中确定。此外,了解蛋白质的三级结构可以帮助确定其功能。本文的目的是比较神经网络(NN)和支持向量机(SVM)在预测62个球蛋白的一级结构二级结构中的性能。对于每个NN和SVM,我们创建了六个二进制分类器,以区分类的螺旋(H)链(E)和线圈(C)。对于NN,我们使用有弹性且无早期停止的反向传播训练。我们使用无隐藏层或具有1,2,...,40个隐藏神经元的一个隐藏层的NN。对于SVM,我们使用参数固定为= 0.1且成本参数C在[0.1,5]范围内的高斯核。 10倍交叉验证用于获得做出正确预测概率的总体估计。我们的实验表明,对于NN和SVM,不同的二元分类器具有不同的准确性:从对线圈与非线圈的正确预测的69%到对立式与非钢绞线的正确预测的80%。进一步证明,没有隐藏层或在隐藏层中不超过2个隐藏神经元的NN足以进行更好的预测。对于支持向量机,我们证明了估计的准确性不取决于成本参数的值。作为主要结果,我们将证明NN和SVM二进制分类器的准确性估计无法区分。这与现代的生物信息学信念相矛盾,SVM的性能优于其他预测指标。

著录项

  • 作者

    Tsilo Lipontseng Cecilia;

  • 作者单位
  • 年度 2009
  • 总页数
  • 原文格式 PDF
  • 正文语种 English
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号