首页> 外文OA文献 >Prediction of protein secondary structure using binary classificationtrees, naive Bayes classifiers and the Logistic Regression Classifier
【2h】

Prediction of protein secondary structure using binary classificationtrees, naive Bayes classifiers and the Logistic Regression Classifier

机译:利用二元分类树,朴素贝叶斯分类器和Logistic回归分类器预测蛋白质二级结构

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The secondary structure of proteins is predicted using various binary classifiers. The data are adopted from the RS126 database. The original data consists of protein primary and secondary structure sequences. The original data is encoded using alphabetic letters. These data are encoded into unary vectors comprising ones and zeros only. Different binary classifiers, namely the naive Bayes, logistic regression and classification trees using hold-out and 5-fold cross validation are trained using the encoded data. For each of the classifiers three classification tasks are considered, namely helix against not helix (H/∼H), sheet against not sheet (S/∼S) and coil against not coil (C/∼C). The performance of these binary classifiers are compared using the overall accuracy in predicting the protein secondary structure for various window sizes. Our result indicate that hold-out cross validation achieved higher accuracy than 5-fold cross validation. The Naive Bayes classifier, using 5-fold cross validation achieved, the lowest accuracy for predicting helix against not helix. The classification tree classifiers, using 5-fold cross validation, achieved the lowest accuracies for both coil against not coil and sheet against not sheet classifications. The logistic regression classier accuracy is dependent on the window size; there is a positive relationship between the accuracy and window size. The logistic regression classier approach achieved the highest accuracy when compared to the classification tree and Naive Bayes classifiers for each classification task; predicting helix against not helix with accuracy 77.74 percent, for sheet against not sheet with accuracy 81.22 percent and for coil against not coil with accuracy 73.39 percent. It is noted that it is easier to compare classifiers if the classification process could be completely facilitated in R. Alternatively, it would be easier to assess these logistic regression classifiers if SPSS had a function to determine the accuracy of the logistic regression classifier.
机译:蛋白质的二级结构是使用各种二元分类器预测的。数据取自RS126数据库。原始数据由蛋白质一级和二级结构序列组成。原始数据使用字母进行编码。这些数据被编码为仅包含一和零的一元向量。使用编码的数据训练不同的二元分类器,即朴素贝叶斯,逻辑回归和使用保留和5倍交叉验证的分类树。对于每个分类器,考虑了三个分类任务,即对非螺旋的螺旋形(H /〜H),对非薄片的螺旋形(S /〜S)和对非螺旋的螺旋形(C /〜C)。在预测各种窗口大小的蛋白质二级结构时,使用整体精度比较了这些二元分类器的性能。我们的结果表明,与5倍交叉验证相比,保留交叉验证获得了更高的准确性。朴素贝叶斯分类器,使用5倍交叉验证,是预测相对于非螺旋的螺旋的最低准确性。分类树分类器使用5倍交叉验证,对于非卷材线圈和非板材纸张分类都实现了最低的准确性。逻辑回归分类器的准确性取决于窗口大小;精度和窗口大小之间存在正相关关系。与每个分类任务的分类树和朴素贝叶斯分类器相比,逻辑回归分类器方法获得了最高的准确性;预测相对于非螺旋的螺旋精度为77.74%,针对薄板针对非板料的精度为81.22%,针对非螺旋的卷材的精度为73.39%。注意,如果可以完全简化R中的分类过程,则比较分类器会更容易。或者,如果SPSS具有确定逻辑回归分类器准确性的功能,则可以更轻松地评估这些逻辑回归分类器。

著录项

  • 作者

    Eldud Omer Ahmed Abdelkarim;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种 English
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号